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From the Preface 


Volume One was begun as the first contribution, by the German section 
of the International Commission for Mathematical Instruction, to the 
topic of the scientific foundations of instruction in mathematics, which 
was one of the topics chosen by the Commission, at a meeting in Paris in 
October 1954, in preparation for the International Congress of Mathe- 
maticians in Edinburgh in 1958. Originally we kept chiefly in mind the 
needs and interests of the instructor in mathematics, but as our cooperative 
effort continued from year to year, it became clear that the material in our 
book was equally important for mathematicians in science, government, 
and industry. For the best realization of our general purposes, each 
chapter has been written by two authors, one of them a university pro- 
fessor, the other an instructor with long experience in teaching. In addition 
to these specifically named authors, of whom there will eventually be more 
than one hundred, from Germany, Yugoslavia, the Netherlands, Austria, 
and Switzerland, important contributions have been made to each chapter, 
in joint semiannual sessions, by the other members of our large group of 
coworkers. 


H. Behnke 
K. Fladt 


PART A 


FOUNDATIONS OF MATHEMATICS 


1. Conceptions of the Nature of Mathematics 


1.1. Mathematics and Its Foundations 


In this section, which is an introduction to the work as a whole, we shall 
be discussing the foundations of mathematics. In other words, we are 
not doing mathematics here; we are talking about mathematics. We are 
engaged in a scientific activity that has received the appropriate name of 
metamathematics. 

Metamathematics forms a bridge between mathematics and philosophy. 
Some of its investigations can be carried out by mathematical methods, 
and to this extent the subject shares the exactness of mathematics, the 
most precise of all sciences. But other parts of metamathematics, among 
them the most fundamental, are not of a mathematical nature, so that we 
cannot expect them to have the absolute clarity of mathematics. As in 
all other branches of philosophy, the answers to many questions are to 
some extent a matter of subjective attitude and even of faith, and in any 
given period the attitude predominantly adopted is determined in part 
by the general spirit of the age. Fundamental philosophical concepts, 
such as idealism, realism, and nominalism, which for centuries have 
contended with one another with varying success, are reflected in the 
different views about the nature of mathematics. Apparently there is no 
hope of progress in an attempt to refute any one of these views scien- 
tifically; rather we try to characterize them as precisely and clearly as 
possible and in this way keep them apart. 

Studies about the foundations of mathematics have experienced a 
tremendous upsurge during the past hundred years, especially since the 
turn of the century. The chief impetus for these investigations was provided 
by the discovery of contradictions in the theory of sets, a mathematical 
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discipline created during the nineteenth century in connection with 
eventually successful attempts to clear up the nature of the real numbers. 
Since many of these paradoxes had already become apparent in antiquity, 
it is natural to ask why we are now able to deal with them successfully, 
whereas the ancients found them completely intractable. The answer is 
that the paradoxes necessarily remained intractable as long as they were 
expressed in one of the natural languages, such as English. On the shaky 
ground of such an imprecise language it is impossible to deal with questions 
of great subtlety, and our present-day successes are entirely due to a 
new instrument, the thoroughgoing formalization of mathematics. With 
this new tool it has at last become possible to construct metamathematical 
theories (for example, that of ‘‘classical’’ logic) which are just as exact 
as the theories of ordinary mathematics. These new metamathematical 
theories are regarded by many mathematicians as the essential hallmark 
of present-day mathematics. 

In the following pages we shall describe some of the various conceptions 
of the nature of mathematics, but it must be remembered that they are 
only ex post facto idealizations of the nature of mathematics. All idealiza- 
tions are extreme in one direction or another, so that scarcely any mathe- 
matician will agree with every detail of any of the positions that we 
shall describe. Mathematics as it exists today is in fact the creation 
of scientists whose inspiration has come from the most varied sources. 
It is to this variety that mathematics owes its immense vitality. 


1.2. The Genetic Conception of Mathematics 


We first describe a conception of mathematics in which the central role 
is played by the human being and his capabilities, so that mathematics 
may almost be said to be a branch of psychology. For example, let us 
consider the subject of geometry. It is certainly true that the earliest 
knowledge of geometry, say among the Babylonians, depended on the 
empirical results of practical surveyors; it is easy to imagine, for instance, 
how the Pythagorean theorem could arise from individual observations. 
Yet at this stage the theorem can hardly be called mathematical, since the 
characteristic difference between a natural science and the purely abstract 
science of mathematics is considered to be that the statements of a natural 
science can be tested (directly or indirectly) by observation, whereas for 
mathematical statements such a test is regarded (for widely varying 
reasons), as meaningless; mathematics is an a priori science in the sense 
of Kant. Consequently, geometry was in its origins a natural science, 
and was not “raised’’ to the position of an abstract, and therefore mathe- 
matical, science until the time of the Greeks. It was they who under 
the influence of Plato distinguished between axioms and the theorems 
derived from them. In their view the axioms were self-evident (cf. §1.3), 
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and the theorems were derived by the process of logical deduction. 
It is probable that the Greek mathematicians took the same attitude 
toward logic as is taken today by most “naive” mathematicians: in 
principle, the ability to reason logically is inborn but can be improved 
with practice. 

Arithmetic and many other branches of mathematics may well have 
begun like geometry as a collection of empirical facts, which was gradually 
raised to the status of a mathematical science. 

But mathematical sciences can arise in another way, which may be 
called intramathematical, to distinguish it from the natural sciences. 
One of the strongest impulses here is the inborn urge, experienced by 
most mathematicians and particularly well-developed among the Greeks, 
toward the sort of beauty that manifests itself in simplicity and symmetry. 
The mathematician feels compelled, while continuing to observe the 
demands of logic, to do away with exceptions. The desire to make the 
operations of subtraction and division universally applicable led to 
the rational numbers. Exceptions in the operation of passing to the limit 
no longer arose in the field of real numbers. The exceptional case of 
parallel lines was removed by the introduction of “infinitely distant” 
points, and in recent times the many exceptional cases arising from the 
existence of nondifferentiable functions have been avoided by the introduc- 
tion of distributions (cf. Vol. III, chap. 3, §3), which had already turned 
up among the physicists, in the form of the Dirac 5-function. 

Most of these new mathematical entities, created to avoid the necessity 
for exceptional cases, were in the first place introduced more or less 
uncritically to meet the demands of each given case. But subsequently 
there arose a desire to establish the actual existence of such entities. 
A powerful tool here is the process of abstraction, which may be described 
as follows. Let there be given a set of entities which agree in many of 
their properties but differ in others. By an act that is in essence arbitrary, 
we shall declare that some of these properties, depending on the context 
in which we make the decision, are essential while all others are not 
essential. The act of “abstraction” from the nonessential properties 
consists of identifying (i.e., regarding as identical) those entities that differ 
only in nonessential properties. A set of such entities thus becomes a 
single unit and in this way a new entity is created (cf. §8.5). This act 
of creation, familar to every present-day mathematician, may be regarded 
as a general human capability. Here we shall only remark that in 
modern mathematics the process of abstraction, in conjunction with 
the search for simplicity, has led to the general structures that are 
to be found, for example, in the theory of groups (cf. §4.3, and Vol. IB, 
chap. 2). 
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1.3. The Extent to Which Mathematical Propositions Are Self-Evident 


As mentioned before, the Greeks divided valid mathematical propo- 
sitions into axioms and theorems derived therefrom. The axioms were 
considered self-evident, immediately obvious to everyone, “neither in 
need of proof nor admitting proof.” The theorems, on the other hand, 
were not immediately obvious in themselves but became evident by being 
derived from the axioms through a series of arguments, each of which 
was obviously valid. But today, as a result of the discovery of non- 
Euclidean geometries, hardly any mathematician holds to the obviousness 
of Euclidean geometry. The axioms of group theory, field theory, lattice 
theory, and so forth are no longer considered obvious. At most, the 
theorems of arithmetic, logic, and perhaps the theory of sets may appear 
evident (either directly or indirectly) to certain mathematicians. For 
example, the intuitionists, following L. E. J. Brouwer, require that every 
mathematical construction shall be so immediately apparent to the 
human mind, and the result so clear, that no further proof is necessary. 
In §4.7 we shall discuss the attempts that have been made to show that 
mathematics is free of inconsistencies. Clearly such a proof of con- 
sistency will be more widely accepted if it can be based on concepts 
intuitively apparent to everyone. 

To clarify these remarks, let us give an example of a statement that 
will be considered self-evident by many readers. Let there be given two 
distinct symbols, neither of which can be divided into meaningful parts. 
Then it will be considered self-evident that the two ‘“‘words” obtained by 
writing these symbols, first in the one order and then in the other, are 
distinct from each other. 


1.4. The Meaning of Mathematical Propositions 

In general, mathematicians are convinced that their propositions are 
meaningful, the extreme position in this respect being that of the so-called 
formalists, who consider mathematics to be a mere game with symbols, 
the rules of which, in the last analysis, are chosen arbitrarily (conven- 
tionalism). Formalism was introduced by Hilbert as a methodological 
principle whereby the concept of a proof of consistency could be clearly 
stated. The formalistic point of view can also be applied to physics if 
with H. Hertz! we define the task of theoretical physics as follows: 
“Within our owa minds we create images or symbols of the external 
objects, and we construct them in such a way that the logically necessary 
consequences of the images are again the images of the physically 
necessary consequences of the objects.” In other words, we construct a 
process parallel to the process of nature. But the essential feature here 


1 Die Prinzipien der Mechanik, Ambrosius Barth, Leipzig (1894), Introduction. 
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is not that this process involves “logical thought” but rather that it runs 
parallel to the process of nature. Thus we could equally well have chosen 
a purely formalistic process, which of course would have to be suitably 
constructed. : 

Although, as was stated before, the majority of mathematicians hold 
to the belief that mathematical propositions are not meaningless, they 
hold widely different opinions about their meaning. It is impossible to go 
into details here about these varied opinions, and we shall content ourselves 
with discussing a fundamental dividing line among them, having to do 
with the concept of infinity. If we adopt the concept of actual (completed) 
infinity, we may speak of the totality of all natural numbers just as readily, 
for example, as of the totality of natural numbers between 10 and 100. 
But those who hold to the concept of potential infinity emphasize that 
the infinite totality of all natural numbers as a set is not immediately 
available to us, and that we can only approach it step by step, by means of 
successive constructions, such as are indicated by 


II], II]. +++ 


This is the so-called constructive point of view; compare the concept of an 
algorithm described in §5. 

If we examine these concepts further, certain other differences come 
to light, one of which we will now illustrate by an example. For any given 
natural number, we can determine in a finite number of steps whether 
the number is perfect or not.2 The proposition: 


(1.1) either there exists an odd perfect number between 10 and 100, or 
else there exists no odd perfect number between 10 and 100 


is acceptable from either the actual or the potential point of view. But 
matters are quite different for the proposition: 


(1.2) either there exists an odd perfect number, or else there exists no 
odd perfect number. 


From the actual point of view, there is no essential difference between 
these two propositions. In each case the argument runs as follows: either 
there exists an odd perfect number between 10 and 100 (or in the set of 
all natural numbers), in which case (1.1) and (1.2) are correct, or else 
there is no such number, and in this case also (1.1) and (1.2) are correct. 

But in case (1.2) an adherent of the constructivist school will argue as 
follows: the assertion that an odd perfect number exists is meaningful 
only if such a number has been found (constructed). On the other hand, 
the assertion that no odd perfect number exists is meaningful only after 


2 A natural number is called perfect if it is equal to half the sum of its divisors; 
for example, 6 is perfect. It is not known whether an odd perfect number exists. 
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we have shown that the assumption of the existence of such a number 
leads to a contradiction (i.e., that we can construct a contradiction on 
the basis of this assumption). But in the present state of our knowledge 
we cannot make either of these assertions and thus we have no reason to 
conclude that case (1.2) is true. 

Propositions like (1.1) and (1.2) are special cases of the so-called 
law of the excluded middle (tertium non datur). The actual point of view, 
in contrast to the potential, accepts this law in every case. 

The constructive mathematician is an inventor; by means of his con- 
structions he creates new entities. On the other hand, the classical mathe- 
matician, who regards the infinite as given, is a discoverer. The only 
entities he can find are those that already exist. 

It is customary nowadays to give the name classical to the actual point 
of view, although the potential attitude can also be traced back to 
antiquity. 


1.5. Remarks on the Following Sections 


These and other differences in the various conceptions of mathematics 
have given rise to a great diversity of opinion about the foundations and 
nature of mathematics, particularly with regard to where the boundary 
should be drawn between mathematics and logic. Within the space at 
our disposal it is impossible to discuss all these questions from every 
point of view. In the following sections we give preference to the classical 
position, with an occasional reference to the constructivist point of view, 
when the difference between them is important. Our reasons for giving 
preference to the classical position are as follows: (1) the greater part of 
established present-day mathematics is based more or less on the classical 
conception, whereas many parts of constructive mathematics are still 
in the process of being built up; (2) the constructive mathematics appears 
to be far more complicated than the classical. For example, it is not 
possible to speak simply of the real numbers. These numbers fall into 
various ‘“‘levels,’’ and for each level there exist still more complicated 
numbers. 

In the present chapter we have no intention of giving an encyclopedic 
survey. We have given priority to such questions as are naturally related 
to college instruction. In some cases the treatment is more detailed 
because the authors believe that the subject is suitable for discussion by 
undergraduates in a mathematics club. 

The material has been arranged as follows: mathematical proof depends 
on the fact that propositions have a certain structure (§2); from the 
classical point of view the basic concept of logic and mathematics is that 
of a consequence (§3), which plays a fundamental role in the axiomatic 
method (§4); in practice, the mathematician obtains consequences by 
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carrying out proofs (§6), a process which has been analyzed in a profound 
way in the theory of calculi (§5). The next three sections deal with the 
theory of sets (§7), Boolean algebra (§8), and the theory of relations (§9). 
A system of axioms of great importance for the mathematician is the 
Peano system for the natural numbers (§10). Finally, we give an analysis 
of some of the best-known antinomies (§11). 


Bibliography 


The bibliography at the end of the present chapter contains several textbooks 
of mathematical logic dealing with the various problems discussed in the 
following sections. Let us mention here, once and for all: Beth [1}, Curry [1], 
Kneebone [1], Novikov [1], Rosser [1], Wang, [1], and the article on “‘Logic”’ 
by Church [2] in the Encyclopaedia Britannica. On intuitionism see Heyting 
{1} and Lorenzen [1], and on the history of logic see Kneale [1]. 


2. Logical Analysis of Propositions 


2.1. The Language of Mathematics 


The results of mathematics, like those of any other science, must be 
communicable. The communication may take place in either spoken or 
written form, but for mathematics the difference between them is of no 
great importance. In studying the foundations of mathematics it is 
customary to use written symbols. 

Communication is ordinarily carried on in one of the natural languages, 
such as English. But a natural language decays and renews itself like an 
organism, so that we are engaged in a rather risky business if we wish to 
entrust ‘‘eternal, unchanging truths” of mathematics to such a changing 
language. Everyone knows how easily misunderstandings arise in the 
ordinary spoken language. So to attain clarity in his science, the mathe- 
matician must try to eliminate the ambiguities of such a language, although 
the attempt involves a laborious process of evolution and cannot be 
completely successful. One method of producing greater clarity lies in 
formalization. In the ordinary mathematical literature this process is only 
partly carried out, as can be seen by a glance at any mathematical text, 
but in studies of the foundations of mathematics, ordinary speech has 
been completely replaced by formalized languages. To some extent these 
artificial languages have been abstracted from ordinary language by a 
process of analyzing the statements of the latter and retaining only what 
is logically important. Let us now undertake this process of logical analysis. 
The reader will note a certain resemblance to grammatical analysis, but 
many of the distinctions made in grammar have no significance in logic. 
As a result, technical terms common to logic and grammar do not 
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necessarily have the same meaning. Finally, let us emphasize once and for 
all that the process of logical analysis is not uniquely determined and could 
just as well be undertaken in a manner different from the one adopted here. 


2.2. Propositions 

Many combinations of letters are called propositions. For example: 
(2.1) Every even number is the sum of two odd numbers. 
(2.2) Every odd number is the sum of two even numbers. 


(2.3) Every positive even number, with the exception of the number two, 
is the sum of two prime numbers. 


In classical logic, which goes back to Aristotle, propositions are divided 
into true propositions and false propositions. The principle of two-valuedness 
states that every proposition is either true or false, although it is not 
required that we should always be able to decide which is the-case. For 
example, it remains unknown at the present time whether the Goldbach 
conjecture (2.3) is true or false, but in classical logic it is assumed that 
statement (2.3) is in fact either true or false. 

Thus the classical logic recognizes two truth values, true and false (often 
represented by 7 and F). Today attention is also paid to many-valued 
logics, and attempts are being made to apply them in quantum mechanics. 

The classical point of view has often been criticized (cf. §1.4). But even 
if we adopt a different attitude, we still accept certain propositions, for 
example (2.1) and reject others, for example (2.2); and in general there 
will be propositions which, at least up to now, have been neither accepted 
nor rejected, for example proposition (2.3). 

It must be emphasized that in the terminology adopted here, which is 
customary in modern researches in the foundations of mathematics, 
a proposition is simply a set of written symbols, so that it becomes essential 
to distinguish between the proposition itself and the state of affairs which 
it describes. Since this distinction will be of importance in the following 
sections, let us point out that one of the most profound thinkers in modern 
logic, G. Frege (1848-1925), distinguishes between the sense (Sinn) and 
the denotation (Bedeutung) of a proposition. By the denotation of a 
proposition, Frege means its truth value. Thus the propositions 
“1+ 1=2” and “2+2= 4” have the same denotation, namely 
true.2 But these propositions have different senses. Similarly, the desig- 


3 One must distinguish between a proposition and a name for the proposition, and 
when we speak of an object, we must have a name for it. Thus we shall make frequent 
use of the following convention: we obtain a name for a proposition (or more generally 
for a set of written symbols) if we enclose the proposition (the set of written symbols) 
in quotation marks. In the present section we shall strictly observe this convention, but 
later it will be convenient, as frequently in mathematics, to let a set of written symbols 
stand as a name for itself (autonomous notation). 
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nations (not propositions) “2 - 2” and “22” have the same denotation, 
namely the number four, but they too have different senses. 


2.3. Propositional Forms 


In mathematics, we frequently encounter, in addition to the propo- 
sitions, sets of symbols of the following sort: 


(2.4) x+3=)y, (2.5) (2,3) = 5, 
(2.6) f(x,y) =z, (2.7) P2. 


We are not dealing here with propositions, since it is obviously meaningless 
to ask, for example, whether (2.4) is true or false. The characteristic feature 
of these new formations is that they contain variables, namely “x,” “‘y,” 
““f” “P.”’ Variables are letters that do not refer to any definite entity but 
rather to a definite range of entities, whose names can be substituted for 
these variables; the range of the variables must be determined in each case. 
Thus in (2.4) and (2.6) the “‘x,” “‘y,” and “z’ are number variables; 
for the ‘‘x,”’ ‘‘y,” “‘z’? we may substitute the names of numbers, e.g., “3’’ or 
“*77.’” In examples 05) and (2.6) the “f” is a function variable, for which 
we may, for example, substitute ““+”’ and in this way* convert (2.5) into 
the proposition “2 + 3 = 5.” In particular, the range to which the 
variable refers may consist of sets of linguistic expressions, when we 
may allow the entities themselves (and not their names) to be substituted 
for the variables. A case of this sort occurs in (2.7), where ‘‘P”’ is a predicate 
variable, referring to predicates. An example of a predicate is the set of 
written symbols “is a prime number.” When this predicate is substituted 
for ‘‘P,” the expression (2.7) becomes the proposition ‘2 is a prime 
number.”’ Written symbols like “2’’ are called subjects (cf. 2.5), so that 
x,” “y,” “z” are subject variables. 

“In order to indicate that a variable “‘x’’ has the real numbers for its 
range, mathematicians often say that x is an indeterminate real number, 
but phrases of this sort are misleading and should be avoided. 

After replacement of the variables by objects in their specified ranges, 
expressions (2.4) through (2.6) become propositions. Consequently, 
such sets of symbols are called propositional forms (formulas).° If we 
agree, aS is often done, to extend the meaning of a propositional form to 
include the propositions themselves, then the latter are propositional 
forms without free variables. 

When a proposition is analyzed logically step by step, we usually 
encounter intermediate forms that are no longer propositions but are 
still propositional forms. For example, consider the Fermat conjecture: 


* Strictly speaking, of course, this proposition should read ‘t+ (2,3) = 5,” but we 
will permit ourselves to make such changes tacitly. 
5 See the footnote in §4.1. 
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(2.8) There do not exist natural numbers x, y,z,n, for whichx-y:z~0 
and 2 <nand x” + y” = 2", 


where it is natural to regard the propositional form 
(2.9) x-y:z2H~0 and 2<n and x*+ y® = 2", 


as a logically important part of (2.8). 

Let us therefore examine propositions and propositional forms simul- 
taneously. In the analysis of propositional forms we find, in addition to 
the variables, two types of elements. First there are such frequently 
repeated words (or groups of words) as “not,” ‘“‘and,” “or,” ‘‘for all,” 
which in a certain sense are the logical framework of a proposition. 
The most important of these are the propositional constants (§2.4) and 
the quantifiers (§2.6). Secondly there are the words (or groups of words) 
that are characteristic of the mathematical theory under examination at 
the moment and do not occur, in general, in other theories. Examples are 
2,” “a,” “is a prime number,” “lies on,” “+.” The most important 
types here are subjects, predicates, and function signs (§2.5). 

In the following sections we shall examine these elements more closely. 
They should be compared with the operator of set formation in §7.7, 
the notation for functions in §8.4, and the description operator in §2.7. 


2.4. The Propositional Constants 


These serve the purpose of combining propositional forms in order to 
construct new propositional forms. A simple example is ‘‘and.”’ 
The two propositional forms 


(2.10) 2 divides x (2.11) 3 divides x 
are combined by “and” into the one propositional form 
(2.12) 2 divides x and 3 divides x. 


The propositional form (2.12) is called the conjunction of (2.10) and (2.11). 
The conjunction of two propositions is again a proposition, which is true 
(accepted) if and only if both the components united by the ‘‘and”’ are 
true (accepted). This fact is expressed by the 


Truth table (logical matrix) te! te ih 
for conjunction T| T F 
Pa oe 


For example, the conjunction of a false proposition (the ‘“‘F”’ in the left 
column of the above table) with a true proposition (the “7” of the top 
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row) is a false proposition (the “‘F’’ at the intersection of the given row 
and column). 
Another propositional constant is “not,” as in 


(2.13) 8 is not a perfect square. 


In a logical systematization of the language it is customary to put the 
“not” at the beginning and to write: 


(2.13’) Not 8 is a perfect square. 


The proposition (2.13’) is called the negation of ‘8 is a perfect square.” 
In the nonclassical schools of logic, negation is either completely banned 
or, if admitted, it is variously interpreted by the various schools. One 
possibility consists of accepting the negation of a proposition a if from a 
we can derive a contradiction (i.e., a proposition that is always rejected). 
If negation is admitted at all, it is always subject to the condition that no 
proposition is accepted together with its negative. In the classical two- 
valued logic it follows that “‘not’’ reverses the truth value. Thus we have: 


Truth table (logical matrix) T | F 
for negation F T 


Another important propositional constant is “or.” The word “or,” 
which in everyday English has several different meanings, is almost always 
used in mathematics in the nonexclusive sense of the Latin ‘‘vel,” for 
example: 


(2.14) Every natural number greater than two is a prime number or has 
a prime factor. 


The combination of two propositions by the nonexclusive “‘or’’ is called 
an alternative (or also a disjunction, although it would be more correct to 
reserve the word “disjunction” for the combination of propositions 
expressed by “‘either—or’’). An alternative is true (accepted) if and only 
if at least one of its components is true (accepted): 


Truth table (logical matrix) me YF 
for the alternative (disjunction) T\|T T 
re ay Aes 2 


The “‘either—or”’ is used like the Latin “‘aut,” as indicated in the following 
table: 


Truth table (logical matrix) ~ _| YF 
for the strict disjunction T\F 
FIT 


14 PART A FOUNDATIONS OF MATHEMATICS 


Among the other constants of the propositional calculus we shall 
mention here only implication (and its consequence equivalence), which in 
the English language is represented by the words “if—then.”’ For the 
“if—then’’ of ordinary spoken language, the logicians have distinguished, 
in the course of the centuries, several essentially different meanings. 
We shall restrict ourselves here to describing the one which appears most 
often in classical logic and mathematics and can be traced back to the 
Stoics (Philon, ca. 300 B.c.). If a reader feels that he cannot reconcile the 
“if—then”’ of the following truth table with his everyday spoken language, 
he is referred to §3. 

Let us now take up the task of constructing a truth table for “if—then.”’ 
[The four entries will be determined as soon as we have fixed on the truth 
value of the following four propositions: 


(2.15) if 1}+1=2, then 14+1=2. 
(2.16) If 1+1=2, thn 14+1=3. 
(2.17) If —2=2, then (—2% = 22, 
(2.18) If 1+1=3, thn 1+1=3] 


We regard (2.15) and (2.18) as true, and (2.16) as false. As for (2.17), 
we can argue as follows: The proposition 


(2.17') For arbitrary real numbers x,y it is true that, if x = y, then 


is true. A statement that holds for arbitrary real numbers x, y, holds in 
particular for x = —2 and y = 2. Thus we recognize (2.17) as a true 
proposition. Consequently we have the 


Truth table (logical matrix) a 
for implication T|T F 
Pea ot 


We establish the convention that in discussing the classical logic we shall 
use “‘if—then”’ in the above sense. It should be noted that there is no 
inherent connection between the two parts of an implication defined in 
this way. For example, the following proposition is true: “if7 + 4 = 11, 
then a triangle with three equal angles has three equal sides.” 

An equivalence (“if and only if’’) may be defined as a conjunction of 
reciprocal implications (see below). Thus we have the 


Truth table (logical matrix) 
for equivalence 
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The propositional constants “‘not,”’ “and,” “or,” “if—then,” “if and 
only if’ occur so frequently in mathematics that it is worthwhile to 
introduce symbols for them. Usage in present-day logic is not yet uniform. 
In the following table the symbol given first is the one used in this article. 


List of Propositional Symbols 


Connective Everyday English | Symbol 

Negation not — (suggests ‘““—’’), over-lining 

Conjunction and A (dual to “v”’), &,.., immediate 
. juxtaposition 

Alternative or v (suggests ‘‘vel’’) 

Implication if—then >, 

Equivalence if and only if «+ (combination of “—” and 

“") = 


As already mentioned, we may consider an equivalence as the con- 
junction of two reciprocal implications. But then we may also say that the 
equivalence is defined by this conjunction. If we introduce the propositional 
variables ‘‘p,”’ “‘q,”’ it is easy to calculate from the tables that we may put 
‘“*p <> q” in place of “(p—q) A (q—p),” as may be seen by calculating 
the four cases p, g = T, T; T, F; F, T; F, F. To state a definition we use 
the sign ‘‘<>,”’ thus, in the present case: 


(2.19) poq=(p>q)rq-—p). 


Similarly, we can justify the following definitions 


(2.20) p-q<e—a)pv4, 
(2.21) pvYq<—7(-4pAa-4q), 
(2.22) praqw—(apv—-4). 


2.5. Subjects, Predicates, and Function Signs 
If we examine the following propositions and propositional forms: 


(2.23) 4 is a prime number, (2.24) x lies between 2 and 9, 
(2.25) 3< x, (2.26) 2+ 4 = 8, 


we see that in addition to the variable ‘“‘x’’ they contain the following 
elements: 


the subjects ae sae “4,” “3.” i 
the predicates “is a prime number,” “‘lies between—and,” “< 


99 669 
’ “9 
” 

. 


the function sign “+ 
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In the proposition “6 exceeds 3,” it is true that from the grammatical 
point of view ‘‘6” is the subject and “‘3” is the object, but in logic both 
the ‘6’ and the “3” are subjects. 

The above predicates are successively !, 3,2, 2-place predicates, and 
the function sign ‘‘+”’ is a two-place predicate. 

Higher-place predicates also occur in mathematics: e.g., the four-place 
predicate in the propositional form “the point-pair A, B separates the 
point-pair C, D.”’ A k-place predicate becomes a proposition through the 
adjunction of k subjects, and in agreement with this manner of speaking 
we shall sometimes say that the propositions are 0-place predicates. 

In principle, function signs can be dispensed with entirely, being 
replaced by predicates. For example, the “+” is superfluous if we 
introduce the three-place predicate “‘is the sum of ... and.” For then in (2.26) 
we may write: “8 is the sum of 2 and 4.” Since function signs can be 
eliminated in this way, it is a common practice in purely logical investi- 
gations to confine oneself to predicates, and in §3 we will take advantage 
of this simplification. But mathematicians would be unwilling to give up 
the functional notation, which is a very suggestive one. 

The importance of subjects and predicates will be discussed below 
in §3.3. 


2.6. Operators in the Calculus of Predicates; Bound Variables 
The proposition 


(2.27) All positive numbers are squares 


contains the operator (or quantifier) “all” of predicate logic, which we 
may analyze in the following way (although there are other possibilities): 
we are dealing here with the one-place predicates ‘‘is a positive number” 
and “‘is a square,”” which we may make more prominent by reformulating 
the proposition: 


(2.27') For all entities: If an entity is a positive number, then this entity 
is a Square. 


Here the repeated word ‘entity’ obviously has the task of indicating 
the places to which the operator “‘all’’ shall refer. The same task may be 
performed in a clear and simple way if we insert one and the same sign 
in each of these places; for example, the letter“‘z.”’ Thus we get the standard 
form: 


(2.27") For all z: If z is a positive number, then z is a square. 


The letter “z’’ serves only to mark the place; instead we could use any 
other letter, e.g., “‘y.”’ It must be noted that ‘‘z’’ is not a variable of the 
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kind considered in §2.5, since (2.27") is a genuine proposition, as distinct 
from a propositional form. If “‘z”’ is replaced in (2.27") by the name of a 
number, e.g., “‘2,” we do not obtain a proposition, but rather the linguistic 
gibberish: ‘for all 2: If 2 isa....” 

It is customary to call the letter “‘z,”’ as used in (2.27), a bound variable, 
whereas the variables considered earlier are free variables. In the propo- 
sitional form 


(2.28) If z is a positive number, then z is a square 


the letter “‘z” is a free variable, and (2.27") is obtained from (2.28) by 
binding the ‘tz’ with the quantifier “‘all.”’ In this way, a free variable 
becomes bound. 

Bound variables refer, in the same way as free variables, to a given 
range; in the present case, for example, to the set of real numbers. 

The quantifier ‘“‘all’’ is called the universal quantifier, and (2.27") is the 
universal quantification of (2.28). A synonym for “‘all’’ is, e.g., ‘‘every,”’ 
and a phrase like ‘‘for no z” means “‘for all z not.” 

A second operator in the calculus of predicates is the existential quantifier 
“there exists” or “there exist” or “for some,” as in the following example. 


(2.29) There exist prime numbers.  (2.29') There exists a y, such that 
y is a prime number. 


(2.29") For some x: x isa (2.30) x is a prime number. 
prime number. 


The propositions (2.29’) and (2.29”) are variants of (2.29). The existential 
quantifier can also be used to bind variables. In this connection (2.29) is 
called an existential quantification of (2.30). 

If the range of the variable is finite (for example, the natural numbers 
from | to 9), then the universal and existential quantifiers are, respectively, 
equivalent to a multiple conjunction and a multiple alternative. Thus if 
“P” stands for “is a prime number,”’ then “‘Every number is a prime 
number”? is equivalent to “Pl a P2 a -:: A P9” and “There exists a prime 
number” is equivalent to “P1 v P2 v --: v P9.”” Consequently we speak 
of a generalized conjunction or alternative and introduce the symbols 
“A” for the universal quantifier and “‘V” for the existential. Then the 
familiar Cauchy definition of the continuity of a function fin an interval J 
takes the following easily understood form:® 


231) AN (xeT>VA(yeTa|x— yl <8>|f(%) —fl < 9). 


® Here the variables x, y refer to real numbers, and the variables «, 5 to positive real 
numbers. Without the latter convention, the statement of (2.31) would be somewhat 
more complicated. 
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For every x and every « there exists, if x is an element of the interval J, 
a 8 such that for every element y of the interval J whose distance from x 
is less than 6 the difference between the functional values f(x) and f(y) is 
less than e. 

It is essential to note that if, as in the present case, several quantifiers 
appear in the same proposition (or propositional form), then the bound 
variables used (here x, y, €, 5) must be distinct from one another. 

In classical logic (but only there!) either of the above quantifiers can be 
defined in terms of the other. For if H is an arbitrary propositional form, 
we can write: 


(2.32) AH <4 V—H, 
(2.33) VH 4 AH. 


These definitions indicate a certain ‘“‘duality” between A and V, which 
corresponds to a duality between a and v (cf. also §9.2). 
The following notations are to be found in the literature: 


Universal quantifier: AH, (x)H, VxH, []H, 
Existential quantifier: VH, (4x)H, (Ex)H, 4xH, YH. 


2.7. Identity and Description 


The notation x = y (x = y) means that x and y are the same entity. 
With this sign for identity we can formulate the statement that the property 
denoted by a given predicate is possessed by exactly one entity. If 
we let “$3” stand for the predicate “is an even prime number,” then the 
fact that there exists exactly one even prime number can be represented 
by the proposition: 


V Bx AAA (Px a Py > x= y). 


If the property indicated by a predicate holds for exactly one entity, 
we may speak of the entity which has this property. Here we need the 
description operator, represented in ordinary English by some such words 
as ‘‘that—which”’ and usually denoted in logic by the symbol (cx). Thus 
(x) Bx is a name for the number 2, in which x occurs as a bound variable 
(cf. §2.6). If we are given an arbitrary predicate Q, e.g., “‘is divisible by 2,” 
then Q(x) Px means that the property indicated by Q is possessed by 
that unique entity for which x holds. The expression Q(x) Px is often 
used by Russell as an abbreviation for the proposition: There exists 
exactly one entity which possesses the property indicated by $8, and all 
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entities which have this property also have the property indicated by Q; 
or, expressed in symbols: 


V Bx a AA (Bx a By > x = y) a A (Px > Ox). 


This proposition is still meaningful (though false), if the property indicated 
by $$ does not hold for exactly one entity. 


Exercises for §2 


1. Set up the truth table for “‘neither-nor.’’ Represent this connective 
in terms of 
(a) — and a, 


(b) — and v. 
2. Calculate the truth tables for the propositional forms: 
(a) (pAqg)>-?P 
(b) (p>) > (AP > 44) 
(CC) (PY AgQ)A AQP) 
(d) (p> 9) 4 @>rnIl>(>r) 
3. Express (2.14) in formal language, with the following definitions: 
Nx <> x is a natural number, 
Px <> x is a prime number, 
Gxy <> x is greater than y, 
Rxy <> x divides y. 
4. Translate into English: 


V (Nx A A (Gxy ’ Ny > (Ryx v Py))). 


5. What does 
Vx (31 <xa — R2x a 5 R3x Ax < 37) 
mean ? 


6. Formulate the axioms of a system of axioms for geometry in the 
symbolism of the predicate logic. 


Bibliograph y 


On the technical use of symbols see Carnap [2]. Information on the use of 
symbols can also be found in many textbooks of mathematical logic (see the 
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Tarski [1] and Frege [2]. 


20 PART A FOUNDATIONS OF MATHEMATICS 


3. The Concept of a Consequence 


3.1. Semantics 


In this section we discuss a concept which must be considered as basic in 
the classical treatment of mathematics and particularly of axiomatization. 
We wish to investigate the connection that exists between, for example, 
the Euclidean axioms and the theorem of Pythagoras, a connection 
which is usually expressed in the form: the theorem of Pythagoras is a 
consequence of the Euclidean axioms. In the present section we think of 
this connection as being static: if the Euclidean axioms are given, then 
the Pythagorean theorem is in some sense given at the same time. But we 
may also think of the situation as a dynamic one: given the Euclidean 
axioms, how can we proceed, step by step, to derive the theorem from 
them. We will return to this question in §6. 

The connection between the theorem and the axioms established by 
saying that the theorem is a consequence of the axioms, can be described 
as follows: the language in which we formulate our mathematical theorems 
stands in a certain relation to the actual “world,” which is to some 
extent described by the language. In other words, the actual world 
provides an interpretation of the language. The science that deals with 
such questions is today called semantics. Some of the concepts of semantics 
can be traced as far back as Aristotle and were important in the work 
of Bolzano, which remained to a great extent unrecognized in his time. 
The modern science of semantics is due to A. Tarski. 

In contrast to semantics, investigations of a language that have nothing 
to do with any interpretation of it are called syntax. 


3.2. Definitions’ 


In the construction of a mathematical theory we not only formulate 
and prove theorems but also make definitions. A definition is an abbre- 
viation. For example, ‘‘x is a prime number” stands for “‘x is a natural 
number which is different from | and has no factor other than | and 
itself.”” Although the importance of definitions is largely a practical one, 
it must not be underestimated. If it were not for such abbreviations, the 
majority of mathematical theorems would be so cumbersome as to be 
completely unintelligible. 

In our study of the concept of a consequence, we must take the defini- 
tions into account. It would be simplest, of course, to eliminate them 
entirely by replacing them with the expressions for which they stand. 
In the Pythagorean theorem, for example, the expression “is a right- 
angled triangle’ would be replaced by some expression involving only 
the fundamental concepts of geometry. If, for convenience, we allow the 


7 For the so-called recursive definitions see §7.4. 


3 The Concept of a Consequence 21 


definitions to stand as they are, it would be more precise to say: the 
theorem of Pythagoras is a consequence of the Euclidean axioms and 
the definitions that are used in the formulation of the theorem. 


3.3. The Ontological Assumptions of Semantics 


Let us examine more carefully the ideas underlying this attempt to 
define a consequence more precisely, since in the semantic construction 
of mathematics it is assumed that such ideas are ‘‘understood.” If we 
ask for the meaning of the linguistic expressions we have called subjects 
and predicates, we see that a subject is a name for an individual, and a 
predicate is a name for an attribute (a property). Subjects in ordinary 
speech, such as “‘Lincoln’”’ or ‘‘New York,” name individuals that have 
a ‘“‘real existence.’” Many mathematicians hold the view that individuals 
such as those named by the subjects “2” and “‘7r’’ have an “ideal existence,” 
being of different kinds according to the branch of mathematics under 
consideration. In real analysis, for example, they are the real numbers; 
and in the theory of functions of a complex variable they are the complex 
numbers. The individuals investigated in any given context are regarded 
as forming a domain of individuals: for example, the domain of natural 
numbers. The domain of individuals may have finitely or infinitely many 
elements but is assumed to have at least one element. 

It is also assumed that together with any given domain of individuals 
the totality of all relevant properties is also given. In this connection a 
property is relevant if for each individual in the domain the answer to 
the question whether or not the individual possesses the property is in 
the nature of things well defined, even though we may not be able to decide 
whether it is “‘yes’’ or “no.”’ This is the ontological basis of the Aristotelian 
principle of two-valuedness (cf. §2.2). In addition to the one-place 
properties, such as the one described by the predicate “‘is a prime number,” 
we also consider many-place properties, e.g., the two-place property 
(or relation) denoted by “<.” For an n-place property (or relation), it is 
assumed that for every ordered n-tuple of individuals from the domain 
under consideration it is determined in the nature of things whether the 
individuals in the given order stand in the given relation or not. 


3.4. Mathematical Axioms as Propositional Forms 


The concept of a mathematical consequence has been developed chiefly 
in connection with geometry, above all in researches on the independence 
of the parallel postulate. We shall therefore take geometry as the starting 
point for our discussion. The modern attitude toward the axioms of 
geometry was described in a drastic way by Hilbert when he said: ‘‘We 
must always be able to replace the words ‘point,’ ‘line,’ and ‘plane’ by 
‘table,’ ‘chair,’ and ‘beer-mug.’ ” 
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Of course, Hilbert does not mean that the theorems of geometry will 
remain true if we make the suggested change, but only that for mathematics, 
which has the problem of determining consequences of the axioms of 
geometry, it is of no importance whether we speak of points, etc., or of 
tables, etc. In other words: if a geometrical proposition is a consequence 
of the Euclidean axioms, then the proposition that arises from it through 
Hilbert’s suggested change in terminology is a consequence of the cor- 
responding axioms arising from the change. In the epigrammatic phrase 
of Bertrand Russell, ‘‘a mathematician does not need to know what he is 
talking about, or whether what he says is true.” 

Since in geometry (as in any purely mathematical science; cf. §1.2) 
we have no interest in the meaning of the predicate “‘is a point,” we may 
replace it (and correspondingly the other geometric predicates) by a 
predicate variable, thereby concentrating our attention on what is 
mathematically essential and doing away with everything else. If we write 
“P” for “is a point,” ““G” for “‘is a line,”’ and “LZ” for “lies on,”’ the first 
Euclidean axiom (in Hilbert’s formulation) 


(3.1) Given any two points A, B, there exists a line a which corresponds 
to each of the two points A, B. 


Given two points A, B, there is not more than one line which corre- 
sponds to each of the given points A, B, 


becomes, in the logical symbols introduced in our earlier sections: 

3.2) AA (Px a Py ax y)— V (Gg a Lxg a Lyg)) 
ANA K(PxA Py AXA yr GgnGhon Lxg 
A Lxh an Lyg a Lyh—>g = h). 


Thus we see that for the pure mathematician it is more precise to regard 
the geometric axioms as propositional forms [like (3.2)] than as proposi- 
tions [like (3.1)]. The so-called fundamental concepts of a given mathe- 
matical theory, i.e., the subjects and predicates appearing in its axioms, 
are in this sense simply linguistic paraphrases for subject variables and 
predicate variables. When the axioms are regarded as propositional 
forms, they cannot be said to be either true or false. They become true 
or false only after the variables occurring in them (i.e., the fundamental 
concepts of the given mathematical theory) have been given an inter- 
pretation; that is, only when to each (free) subject variable we have 
assigned an individual of the underlying domain of individuals and to 
each predicate variable a property (with the same number of places) of 
the elements of the domain. When propositional variables occur, they are 
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to be interpreted by means of propositions. Then it becomes meaningful 
to say that a given propositional form is true or false in this interpretation. 
The fact that a propositional form H is true in the interpretation D is 
expressed by saying: D satisfies H, D is a model of H, D verifies H, or H 
is true in D. As an example let us choose the domain of natural numbers 
and consider the propositional forms 


(3.3) Px (3.4) — Px 

(3.5) Px a Ox (3.6) Px v Ox 

(3.7) Px A — Px (3.8) Px v — Px 
(3.9) V (Px a Oxy) (3.10) A Pxov Py. 


The form (3.3) is true in the interpretation which to the variable x assigns 
the number 4, and to the variable P the property of being even; in other 
words, 4 has this property. The form (3.3) is not true if P is interpreted 
as before while x is interpreted as 5. The form (3.4) is the opposite of (3.3). 
The form (3.6) is true, and (3.5) is not true, if x is interpreted as 4, while 
P is interpreted as the property of being prime, and Q as that of being a 
perfect square. The form (3.8) is true for any interpretation, and (3.7) 
for none. Consequently, (3.8) is said to be valid or a tautology, and (3.7) 
is contradictory or a contradiction. In (3.9) only the P, Q, y require 
interpretation and in (3.10) only the P, since the other variables are bound 
(cf. §2.6). The form (3.9) is true if y is interpreted as 10, P as the property 
of being prime, and Q as the relation of “smaller than,” since there 
exists at least one number which is both prime and smaller than 10. The 
form (3.10) is a tautology, expressing the fact that a bound variable may 
be renamed at will. See also the examples in §3.8. 


3.5. The Artificial Language of the Predicate Logic 

The propositional forms (3.2), (3.3), ... (3.10) contain, apart from 
brackets, only logical symbols and subject and predicate variables. 
These propositional forms are called expressions in the predicate logic. 
Here it is important that only the subject variables, and not the predicate 
variables, can be bound by quantifiers.? The language of this predicate logic 
is an artificial language capable of expressing a great part of mathematics. 
As soon as we have chosen a domain of individuals, we can interpret the 
subject variables and the predicate variables and can then give an exact 
definition of what it means to say that a proposition is true in this inter- 
pretation. It is most convenient to construct a definition inductively by 


8 If we also allow the predicate variables to be bound, we are in the so-called “logic 
of the second order,’’ or “‘extended predicate logic,”’ cf. §10.2. 
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proceeding from simpler to more complicated expressions. Through lack 
of space we must content ourselves with this remark and with the above 
examples. 


3.6. The Concept of a Consequence 


Now let 2 be the set of axioms and H a theorem in a mathematical theory, 
e.g., in Euclidean geometry. We then say that H is a consequence of . 
If we now take H and the elements of 2 to be propositional forms and 
interpret the fundamental concepts in such a way that all the axioms are 
true, it is reasonable to expect that in the given interpretation H will 
also be true. Thus we have a necessary condition which the concept 
of a consequence must satisfy. In order to give the widest possible meaning 
to the concept, we agree to regard this necessary condition as being also 
sufficient. In this way we arrive at the following 

Definition of a Consequence: The propositional form H follows from the 
set 2l of propositional forms (H is a consequence of 2) if every model 
common to all the propositional forms of is also a model of H. 


Examples: Py, and also Qy follow from Py a Qy; and V, Px follows 
from A, Px (here it must be noted that by §3.3 a domain of individuals 
contains at least one element). Also, A, Py follows from A, Px and 
conversely. Every propositional form follows from a contradictory prop- 
ositional form. A tautology follows from any propositional form. 


3.7. Consequence and Tautology 


If the number of axioms is finite, we can reduce the concept of a con- 
sequence to that of a tautology. For this purpose we first form the 
conjunction @ of all the axioms in 2(. Then we have the important theorem: 


H follows from Yt if and only if © + H is a tautology. 


This theorem expresses the relation between ‘“‘follows from” and “‘if— 
then.”” The theorem is proved as follows: 


(a) We first assume that H follows from 2. Then we must show that 
© — H isa tautology, i.e., that O — H is true for every interpretation over 
an arbitrary domain of individuals. To this end we make an arbitrary 
interpretation D of the given domain. In case @ is false in D, then 0 — H 
is certainly true in D (cf. the logical matrix in §2.4); and in case @ is true 
in D, then H must also be true in D, in view of the hypothesis that H is 
a consequence of D; thus in this case also @ — H is true in D. 

(b) We now assume that © — H is a tautology and must show that 
H follows from Qf. But if this were not the case, then there would be an 
interpretation D for which all the propositional forms in 2 (and there- 
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fore ®) would be true but H would be false. Then D falsifies O — H, 
in contradiction to the hypothesis that © — H is a tautology. 


3.8. Examples of Tautologies 


(3.11) PP, 

(3.12) (pA p)o(apv—-), 
(3.13) A(PvVgo(jpaaq, 
(3.14) 3 (p>go(pra7Q), 
(3.15) a(pege(po—q), 
(3.16) aAHOVGH, 
(3.17) aAVHOAGH. 


These tautologies form the basis for the technique of negation. We 
obtain a simple application of the theorem in 3.7 if we weaken (3.14) to 


(3.14’) — (p> q)—> (pa @). 


Then p A — q follows from — (p —> q). 


Exercises for §3 


1. Which of the following propositional forms are tautologies and 
which are contradictions ? 


(a) V Px v A + Px, 
(b) A Px v V Px, 
(c) aA Px A 4 V + Px, 
(d) V — Px V (Py > Qy), 
(e) AA (Rxy 0 Ryx—>x =y). 
2. Hy(=) A A (Rxy 0 Ryz — Rxz) 
H, (=) A A (Rxy v Ryx) 
Over the domain of individuals {1, 2, ...,10} give interpretations 
which will falsify, or verify, 
(a) My 


(b) Hy 
(c) H, a HA, 
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3, Hy (=) VA Ry 
H, (=) A A Rxy 
Hy (=) A A Rxy 
Hy (=) VV Rxy 
Hy (=) AA Rxy 
Hy (=) ¥ V Rxy 


Which expressions follow from which ? In the cases in which H; does 
not follow from H; give a counterexample, i.e., a model of H; which 
is not a model of H, . 


Bibliography 
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semantics see Carnap [1], Linsky [1] and Tarski (2). 


4. Axiomatization 


4.1. The Origin of Systems of Axioms 

It is today customary to construct a mathematical science axiomatically, 
that is, by first choosing a set of propositions® as the axioms and then 
drawing consequences from them. The subjects and predicates that occur 
in the axioms are called the fundamental concepts of the system of axioms. 
From the modern point of view these axioms are considered to be variables, 
as explained in §3.4. In general, the number of axioms is finite, although 
infinite systems of axioms are sometimes admitted if their structure is 
immediately clear. For example, we might take for axioms all the propo- 
sitions of a certain form. In this case we sometimes speak of an axiom 
schema (for examples, see §10.3 and §11.2). 

If the fundamental concepts occurring in the axioms are taken to be 
variables, so that the axioms themselves become propositional forms, we 
can no longer regard them as “‘self-evident.’’ Moreover, if two systems 
of axioms are equivalent (that is, if each of them is a consequence of the 
other), then in principle they are on an equal footing, even though one 
of them may be preferred on more or less subjective grounds, e.g., because 
of its greater logical clarity. . 

Theoretically, we could use any propositions at all to form our set of 
axioms, but it turns out that in modern mathematics relatively few systems 


® By the arguments in §3.4 we should really say “propositional forms’ instead of 
“propositions,” but here we wish to conform to the ordinary mathematical usage, in 
which axioms and theorems are called “propositions.” 
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are in actual use. So it is natural to ask about the motives for choosing 
these particular systems. We shall confine ourselves to a discussion of 
this question from the following point of view: It is an established fact 
that in many cases the theory had a prior existence (at least to a great 
extent) and the axioms for it were chosen later. But in many cases the 
axioms are primary; and the theory is to a certain extent secondary, 
since it has been created and defined by the axioms themselves. We shall 
distinguish the two cases by speaking of an heteronomous and an auton- 
omous system of axioms, but it must be emphasized that these concepts 
are idealizations; in fact, it is often very hard to decide how a given 
system of axioms actually arose. 


4.2. Heteronomous Systems of Axioms (Subsequent Choice of Axioms) 

In general, we are dealing here with the following problem: we are 
given a set %, usually large, of preassigned propositions and we must 
find a system of axioms (as simple and clear as possible) from which all 
these propositions follow.!° 

A characteristic example is provided by any theory in physics, or in 
any other science based on observation. Here the preassigned set 8 
consists of a large number of empirical facts, perhaps accompanied by 
certain hypotheses, and it is our task to find a system of axioms Q that 
will provide an economical description of the whole relevant body of 
knowledge 8. Assuming that we have found such a system of axioms QI, 
we obtain a mathematical science if we ask what are the consequences that 
follow from QI; but if we then proceed to ask whether these consequences 
(so far as they can be tested) are in agreement with observation, we are 
in the domain of theoretical physics. Here again the distinction is clear 
between a atural science (cf. §1.2) and mathematics as a purely abstract 
science, When a mathematical system of axioms YI has arisen in this way, 
we shall say that the theory determined by Y has a physical (or, more 
generally, an empirical) origin. It seems reasonable to believe that 
Euclidean geometry is such a science. Basic geometrical concepts, like 
point and line, originated from the need to describe physical data, and 
consequently the first geometrical propositions were of a physical nature. 
An example is the theorem of Pythagoras, already well known to the 
Babylonians in 1700 B.c. This physical origin of geometry becomes 
particularly clear when we reflect that ‘“‘experiments’’ are often made in 
school to convince the students that the sum of the angles in a triangle is 
180°. The axiomatization of geometry was begun by the Greeks, who 
from the time of Thales (about 590 B.c.) showed that certain geometrical 
propositions could be made to depend upon others. In relinquishing all 


10 The system of axioms must, of course, be consistent. See §4.7. 
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recourse to experience, they became the creators of mathematics in the 
strict sense of the word. The name of Euclid (about 300 B.c.) marks 
the completion (for the time being) of the axiomatization of geometry. 

It is also reasonable to suppose that arithmetic and, to take a modern 
example, the theory of sets have an empirical origin. The axiomatization 
of these sciences will be discussed in §10 and §7. 


4.3. Autonomous Systems of Axioms (Systems of Axioms 
as Sources of New Theories) 

The mathematical theories discussed above were already in existence, at 
least in a certain sense, long before the corresponding systems of axioms, 
as becomes quite clear when we recall that in the schoolroom these 
sciences are often presented without reference to any system of axioms 
at all; for example, Euclidean geometry in the secondary school and the 
infinitesimal calculus or naive set theory in the university are often 
taught in this way. But the situation is completely different in modern 
mathematical sciences like group theory, ring theory, or lattice theory. 
These theories cannot be separated from their axioms, since it is only 
through the axioms that they have come into existence at all. A typical 
example is the theory of groups. In the development of mathematics it 
has often happened that widely diverse subjects have been seen to depend 
on lines of argument that are surprisingly similar to one another; e.g., 
the period of the decimal expansion of a given rational number compared 
with the number of times a dodecahedron must be rotated in order to 
bring it back to its original position. It would clearly be more economical 
not to repeat such arguments at every new occasion but to present them 
once and for all in such a form that they are immediately applicable to 
every special case. But an even more important advantage is the fact 
that by proceeding in this way we concentrate on the essential features 
of the situation, thereby gaining a deeper insight into the connections 
among its various parts. In group theory such a program has been carried 
out. It is possible to formulate a small number of axioms with only ove 
fundamental concept, namely group multiplication, such that the theory 
is defined, or so to speak created, by the axioms themselves. The con- 
sequences of these axioms are called the theorems of group theory. Then by 
interpreting the group multiplication in various ways, each of which must 
satisfy the axioms, we at once obtain the original theorems in the various 
branches of mathematics that led us originally to create the theory of 
groups. The whole of modern mathematics is characterized (cf. III14) by 
the attempt to give an increasingly central role to such systems of axioms 
as those of group theory. 

Several of the more modern studies of geometry consist of examining 
the consequences of a part of the Euclidean axioms, e.g., the axioms of 
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connection or the axioms of order. Systems of axioms of this sort can 
also be called autonomous. Similarly, the autonomous systems for algebra 
are to be considered as arising from the heteronomous system for arith- 
metic. 


4.4. Independence of a System of Axioms 


A system of axioms is said to be independent if no axiom is a consequence 
of the others. In general, independence is desirable but not altogether 
necessary; often it is an advantage that can be obtained only at the cost 
of great complication. 

The independence of a given system of axioms is most simply demon- 
strated by finding for each axiom H an interpretation in which H is false 
but all the other axioms are true. As a simple example let us consider the 
three axioms that define an equivalence relation R: 


(4.1) A Rxx (reflexivity) 
(4.2) A A (Rx y > Ryx) (symmetry) 
(4.3) A A A (Rxy A Ryz) — Rxz) (transitivity) 


In each case let us choose as domain of individuals the set of three natural 
numbers {1, 2, 3}. 


Independence of (1): we interpret R as the empty relation (i.e., the 
relation that holds for no pair). 

Independence of (2): we interpret R as the < relation. 

Independence of (3): we interpret R as the relation that holds between 
two elements x, y of the domain of individuals if and only if 
Ix—yl<l. 


The fact that the parallel axiom is independent of the other Euclidean 
axioms can also be proved by this method (see II2, §2). 


4.5. Completeness of a System of Axioms 


Let there be given an axiom system Y. A proposition that contains 
only subjects and predicates already occurring in WX will be called a 
relevant proposition and XY is said to be complete! if for every relevant 
proposition H, either H follows from % or — H follows from Y. This 
is of course, different from saying that H v — H follows from ; the latter 
proposition is always true, since H v — H is a tautology (tertium non 
datur). 


1 Other definitions of completeness can also be found in the literature; cf. §6.2. 
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Autonomous systems of axioms are in general incomplete as a result 
of their inherent nature (cf. §4.6). E.g., from the system of axioms for 
group theory it is impossible, as can be easily shown by examples, to 
deduce either 


(4.3) A aaa? or (4.4) - A Lx 


On the other hand it is natural to expect, in general, that heteronomous 
systems of axioms will be complete in view of their physical origin. 
For suppose we have a relevant proposition H such that neither H nor 
— H follows from the axioms. Then the physicist will at once attempt to 
obtain experimental evidence of the correctness or falsity of this propo- 
sition and, if successful, will add either H or — H to the set of axioms. 
Thus, physicists are always striving to complete their systems of axioms, 
so that it is natural to expect completeness in a well developed 
theory. 

Examples of a complete system of axioms are the system for Euclidean 
geometry or the Peano system for the natural numbers (cf. §10.2). 


4.6. Monomorphy of a System of Axioms 


The concept of isomorphy, familiar to every mathematician from 
group theory (see, e.g., IB2, §4.2), can be generalized (we omit the 
definition here; cf. IB10, §1.3) in such a way that one may speak of 
isomorphic interpretations of a system of axioms. To take an example 
from geometry: The “natural” interpretation of the Euclidean system of 
axioms, in which the points are “idealized actual points” and the lines are 
“idealized actual lines,” etc. is isomorphic to the interpretation provided 
by analytical geometry, in which the points are triples of numbers, the 
lines are the coefficients of the Hesse normal form, etc. 

If a given system of axioms is valid in one interpretation, it is also 
valid in any isomorphic interpretation. For example, if a given structure 
is a group, then every isomorphic structure is also a group. 

Consequently, it is impossible to characterize a given model completely 
by means of a system of axioms. The most that can be attained in this 
direction is to characterize the model “up to isomorphism.”’ A system of 
axioms is said to be monomorphic (categorical) if any two models are 
isomorphic. 

Autonomous systems of axioms are intended to have a wide range of 
application and therefore, in general, they are not monomorphic; in fact, 
there exist nonisomorphic groups, nonisomorphic rings, etc. On the 
other hand, the heteronomous systems of Euclidean geometry and 
arithmetic (cf. §10) are monomorphic. 
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Every monomorphic system of axioms YX is complete: Let H be a relevant 
proposition. Then we must show that H or .H follows from YI. We 
proceed indirectly by assuming that neither H nor — H follows from YI. 
By the definition of a consequence given in §3, there exists an interpretation 
D, in which, since H does not follow from YW, all the axioms of Yt are 
true but H is false. In the same way, there exists an interpretation D, 
in which, since — H does not follow from YI, all the axioms of YI are 
true but —, H is false, and therefore (by the tertium non datur) H ts true. 
On account of the assumed monomorphy of YI, the two interpretations 
D, and D, are isomorphic, and since H is true in D,, it follows that 
H must also be true in D,. But this contradiction refutes the assumption. 


4.7. Consistency of a System of Axioms 


Here we discuss the concept of semantic consistency, to be distinguished 
from syntactic consistency (see §5.7), which is another extremely important 
concept in modern studies in the foundations of mathematics. A system 
of axioms is said to be (semantically) consistent if it has at least one model. 

In view of the physical origin of many heteronomous systems of axioms, 
it is natural to regard them as being consistent. But it must always be 
kept in mind that the consistency of a system of axioms is not, in general, 
an established fact but only a belief based on confidence in our intuitions. 
Particularly problematical is the consistency of a set of axioms that can 
only be interpreted in a domain with infinitely many individuals. 

The question of the consistency of a given system of axioms can often 
be reduced to the same question for another system, in which case we 
speak of a proof of relative consistency. Thus, by means of analytical 
geometry we can show that the system of axioms for Euclidean geometry 
is consistent if the system for real analysis is consistent. The most 
interesting proof of relative consistency is due to Godel, who proved that 
a system of axioms for set theory which includes the axiom of choice 
and the continuum hypothesis (see §7) is consistent relative to the same 
system without these axioms. 

In fact it is well known that belief in the existence of a suitable inter- 
pretation can be quite mistaken, for example, in naive set theory (see 
§7 and §11). 

A system of axioms is inconsistent (self-contradictory) if and only if 
the proposition H a — H follows from for every relevant proposition H. 
For if Qf is inconsistent, then 2f has no model. Thus, every model of 
is also a model of any arbitrary relevant proposition H, and in particular 
of Ha —H; that is, H a —H follows from 2. On the other hand, if 
H a —H follows from Qf, every model of 2( must also be a model of 
Ha-—H; but Ha-—H is unrealizable, and therefore 2{ has no 
model. 
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Exercises for §4 


1. The order and the successor relation for the natural numbers can be 
described by the following axioms (Peano-Hilbert-Bernays): 


a(—x <x) 
nz(X ya y<mzy>ax <2z) 
eX OX 

,—x = 0 

ny (X' = y' +x = y) 


Show by means of suitable models that this system of axioms is in- 
dependent. 


Bibliography 


See the textbooks listed in the other sections. 


5. The Concept of an Algorithm 


5.1. Examples of Algorithms 


Mathematicians are interested not only in theoretical insight and 
profound theorems but also in general methods for solving problems, 
methods whereby certain classes of problems can be handled in such a 
systematic way that the actual process of solution becomes, so to speak, 
automatic. Every newly discovered method represents an advance in 
mathematics, although the problems that are solvable by this method 
thereby become trivial and cease to form an interesting part of creative 
mathematics. 

A general method of this sort is often called a calculus, the name being 
derived from the small stones or calculi formerly used in computation. 
Another word with the same meaning is algorithm, derived from the 
name of the Arabic mathematician al-Khuwarizmi (about A.D. 800). 

Let us give some examples of algorithms: (a) the usual methods of 
addition, subtraction, multiplication, and division of integers in the 
decimal notation; (6) the Euclidean algorithm for the highest common 
factor of two integers; (c) the well-known procedures for solving linear 
and quadratic equations with integral coefficients; (d) the method of 
extracting a square root by computing successive decimal places; (e) 
integration of rational functions by means of partial fractions. 

The essential feature of an algorithm is that it requires no inspiration 
or inventiveness but only the ability to recognize sets of symbols and to 
combine them and break them up according to rules prescribed in advance; 


5 The Concept of an Algorithm 33 


in other words, to carry out elementary procedures that can in principle 
be entrusted to a machine. 

An algorithm proceeds step by step. Some algorithms, when applied 
to a concrete problem, break off after a finite number of steps, as in the 
above examples (a), (4), (c), (e). Others do not come to an end but can 
be carried out as far as we like, as in the extraction of a square root, 
example (d). In the above examples every step is, in general, uniquely 
determined. But in other algorithms it may happen that each step depends 
upon a free choice among several (finitely many) possibilities. For example, 
consider the algorithm (/) which, when applied to two prescribed integers 
a,b (in decimal notation), leaves open at each step a free choice 
between two possibilities: when two numbers (including a and 5) are 
already found, we may take (1) their sum or (2) their difference. This 
algorithm enables us to find all the numbers in the module (a, 5) 
generated? by the two numbers a and b. 

A set of numbers (i.e., a row of symbols) which, as in this example, 
can be determined by an algorithm, is said to be recursively enumerable. 
Of course as long as the word “‘algorithm’”’ is being used in an intuitive 
way, the meaning of ‘“‘recursively enumerable”’ also remains intuitive; 
precise definitions are given in §5.5. 

Algorithm (/) has two initial “formulas,” a and b, to be thought of 
as given in their decimal notation, since an algorithm is restricted by its 
very definition to operating with rows of symbols (or equivalent objects). 
The above possibilities (1) and (2) for proceeding from one step to the 
next are called the rules of the algorithm. 

The initial formulas of an algorithm are sometimes called axioms and 
its rules are rules of inference. A finite sequence of formulas, in which 
each formula is an axiom or arises from the preceding formulas by 
application of one of the rules, is called a derivation or a proof. These 
terms are borrowed from logic but are used here in a much more general 
sense. 


5.2. Examples of “Arithmetical” Algorithms 


An algorithm for the enumeration of finite sets of strokes (or, as we 
may say, ‘‘of natural numbers’’) can be described by one axiom 


(5.1) F 


and one rule 


(5.2) a 


12 For the concept of a module, see IB1, §2.3. 
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(Here a/b is to be read: from a we may proceed to b.) This rule contains 
a proper variable e, to be interpreted as follows: any expression already 
derived may be substituted for e, and then a stroke may be added to the 
right of it. For example, in the algorithm defined by (5.1) and (5.2), the 
following expressions are derivable: | (as an axiom), ||, |||, |III. 

For the expressions derived in a given algorithm, we may use variables, 
say n, m, p, q, for the rows of symbols in the algorithm just described, 
and then these variables can be used to describe further algorithms. 
For example, we can define an algorithm for the addition of natural 
numbers (sequences of strokes). As an axiom we take 


(5.3) n+|=n\, 


which is more precisely an axiom schema. Then n can be replaced by any 
one of the rows of symbols, e.g., ||, that are derivable in the algorithm 
defined by (5.1) and (5.2). As a specialization of (5.3), we obtain the axiom: 


(5.3’) [+ | = Ill. 
As the only rule in the new algorithm we take 


n+m=p 
5.4 —_—_____—— 
oF n+m|= p| 

By setting || for n, | for m, and ||| for p, we obtain from (5.3’) the formula 


(5.4’) lI = Ill 


In order to construct an algorithm for multiplication, we adjoin the 
further axiom (axiom schema): 


(5.5) nx |=A, 


and the rule (now with two “‘premisses’’): 


nx m= p, 
(5.6) A dia eet Oy 
nxm|=4q 


As a special case of (5.5) we obtain 

(5.5’) Hx |= Il. 

If we apply the rule (5.6) to (5.4’) and (5.5’), we have 
(5.6’) I] x |] = III 


5 The Concept of an Algorithm 35 


5.3. Recursively Enumerable and Decidable Sets 


Although it has been possible to set up algorithms for the solution of 
many mathematical problems, others have continued to resist every attack 
of this kind, a prominent example being the “word problem” in group 
theory (cf. IB2, §16.1). As a result, mathematicians finally began to 
suspect that certain problems cannot be solved by any algorithm whatever. 
It is obvious that a theorem of this sort will become meaningful, and we 
can proceed to demonstrate it, only when we have given an exact definition 
of the concept of an algorithm. 

More precisely, we need only know what we mean by saying that a 
given set of rows of symbols is recursively enumerable, i.e., can be found 
by means of an algorithm. Here we must realize that more is expected 
from such a definition than, for example, from the definition of continuity 
of a function. In the latter case we are quite satisfied with the simple, 
well-known definition of Cauchy, since it is to a great extent in agreement 
with our intuition, although everyone knows, from certain striking 
examples, that this agreement is byno means complete. But for a recursively 
enumerable set, where we are dealing with the question of what can or 
cannot be accomplished in an actual computation, the definition must 
agree to the greatest possible extent with our basic intuitive notion of 
what is meant by effective calculation of the answer to a given problem. 
The assertion that a given set is not recursively enumerable, i.e., that it 
is impossible to construct an algorithm for finding the elements of the 
set, is of interest only to the extent to which our formal definition of 
an algorithm is in agreement with our intuitive notion of a process of 
computation. 

Several different definitions have been suggested for enumerability (the 
first one of them by Church in 1936), but in spite of the fact that they 
originated in very different settings, they are all equivalent to one another. 
Consequently, many logicians and mathematicians are convinced that 
these definitions correspond completely to our intuitive notion of 
computability. They are to be considered from the classical point of view, 
since they make use of the nonconstructive phrase “‘there exists.”’ If they 
have been criticized, it is usually by mathematicians who do not share the 
classical point of view and therefore assert that the definitions include 
more than our original intuitive notions. However, a proof of non- 
enumerability based on too broad a definition retains its validity when the 
definition is restricted. 

After a preparatory section, we shall give in §5.5 a definition of algorithm 
(or alternatively of recursive enumerability) which is based on the concept 
of a recursive function. We could set up an alternative definition by 
generalizing the procedure in §5.2: and a third method stems from the 
fact that, in principle, every algorithm can be entrusted to a machine 
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(Turing). There are further possibilities but we omit them here for lack 
of space. 

A property © of formulas is said to be decidable if the set of formulas 
that have the property © and also the set of formulas that do not have it 
are recursively enumerable. The decidability of several-place properties 
(relations) is defined correspondingly. In the case of a decidable property 
we can decide, by any of the three methods of recursive enumeration 
mentioned above, whether a given formula ¢ has the property or not. 


5.4. Gdédelization 


The formulas that can be written in a given finite or countably infinite 
alphabet can be characterized in various ways by natural numbers (or 
by the sequences of strokes that correspond to them). We now describe 
one such method, taking as an example the formulas (words) that can 
be written with the twenty-six letters of the Latin alphabet. We first 
enumerate the letters; e.g., 1.a, 2.5, ...,26.z. Now consider a given 
n-letter word (1.e., a formula) in the alphabet, and let the numerals assigned 
to the successive letters of this word be v, , ..., v, . Also let p; = 2, pp = 3, 
D3 = 5,... be the sequence of prime numbers. Then the given word can 
be characterized by the number (Gédel index) 


(5.7) Py : Py? aes ea 


For example, the word “‘cab’”’ will receive the number 600 = 23 - 3! - 52, 
Distinct words correspond to distinct numbers but not every number 
corresponds to a word. If the number of a word is known, the word 
itself can be recovered. 

A transition of this sort from the words to the corresponding numbers 
is called arithmetization or Gédelization. \n all questions concerning 
algorithms, it makes no difference whether we discuss the original 
formulas or their Gddel numbers. 

A recursively enumerable set of words is transformed in this way into 
a recursively enumerable set of natural numbers and vice versa. It therefore 
makes no difference, in principle, whether the desired exact definition of 
recursive enumerability is expressed in terms of words or of natural 
numbers. Since the natural numbers have a somewhat simpler structure 
and are more familiar to mathematicians, we will now proceed to define 
the concept of recursive enumerability for a set of natural numbers. 


5.5. Computable Functions and Recursively Enumerable Sets 


Instead of giving a direct definition of a recursively enumerable set, 
we shall first define the concept of a computable function, to which the 
concept of recursive enumerability can be reduced. 
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We consider functions, with one or more arguments ranging over the 
entire set of natural numbers, whose values are also natural numbers. 
Such a function is said to be computable (in the intuitive sense) if, for 
arbitrarily preassigned arguments, there exists a procedure for calculating 
the value of the function in a finite number of steps. Examples of com- 
putable functions are the sum of two numbers, and their product. The 
following example defines a function f about which we do not know at 
the present time whether it is computable or not: 


0, in case there exist natural numbers x, y, z such that 
(58) finy= tx y z40 and x¥ + yh ar, 
1 otherwise. 


At present we know only a few of the values of this function, e.g., 


A) =/Q]=9 (3) = f4) = ++ = f(100) = I. 


If the Fermat conjecture is true, then f(7) = | for # > 3. 

The following argument shows that the computable functions are 
exceptional. There cannot exist a greater number of computable functions 
than there are methods for computing them. Every method of com- 
putation must be capable of being described. A description consists of a 
finite number of symbols. It follows that there are only countably many 
possible descriptions, and therefore only countably many computable 
functions. On the other hand, the total of number of functions is un- 
countable, as may be proved by the same diagonal procedure as the 
uncountability of the continuum (see §7). 

The concept of a recursively enumerable set can be reduced to that of a 
computable function. For we have the theorem: 


A non-empty set of natural numbers is recursively enumerable if and only 
if it is the range of values of a computable function.” 


To prove this theorem we argue as follows: A set which is the range 
of values of a computable function f can be recursively enumerated by 
calculating the successive values /(0), f(1), f(2), .... as may be done in 
each case in a finite number of steps. We thus obtain an algorithm that 
produces all the elements of the set (in general, of course, they will not 
be obtained in order of magnitude, but that is not necessary). 

On the other hand, let there be given a non-empty, recursively 
enumerable set M, so that M contains at least one element  . Now the 
successive steps of an algorithm for the recursive enumeration of M can 
be arranged (if necessary by the adjunction of certain rules) in a unique 


13 It is customary to say that the empty set is also recursively enumerable. 
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sequence, with a zeroth, first, second step, etc. Every step produces an 
element of M, or at least an intermediate stage toward the production of 
such an element. We now define a function / as follows: 


ny , in case the nth step provides only an 

= intermediate stage, 
k, in case the nth step provides an element of M 
and this element is k. 


From the definition of fit is clear that fis computable and that the range 
of values of f coincides with the set M. 

Thus it is only necessary to give a precise definition of the concept of a 
computable function. This precise definition is provided by the recursive 
functions as defined in the next section. 


5.6. Recursive Functions 


In the domain of natural numbers the sum function is determined by 
two equations (cf. §5.2): 


(5.9) xt+0O0=*x, 
(5.10) x+y =(x+ yy, 


where the successor of y is denoted by y’. These equations enable us to 
calculate the sum u + v of any pair of natural numbers u, v in a purely 
formal way. For this purpose we require only two rules: (a) for the 
variables occurring in (5.9) and (5.10) we may substitute numerals 
(0, 0’(=1), 0’(=2), ...), and (b) if for these numerals we have already 
derived the result z, + z, = zg, then on the right-hand side of any 
subsequently derived equation we may replace z, + z, by zg. Corre- 
sponding rules hold for the product function, except that in this case the 
set of two equations (5.9) and (5.10) must be augmented by two further 
equations 


(5.11) x:0=0, 
(5.12) yey Swope: 


Thus the sum plays the role of an auxiliary function for the product. 
The concept of a recursive function, as defined by Herbrand and Gédel, 
is based on a generalization of the above procedure. An n-place function ¢ 
is said to be recursive if there exists a finite system of equations > con- 
taining a function symbol f corresponding to ¢ and also in general, 
containing function symbols g, A, ... for auxiliary functions, such that for 
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every choice of n + 1 numbers k,, ..., k, , kK we have the following result:!4 
if Z, , ..., Z, , Z are the numerals corresponding to the numbers k,, ... k, , k, 
then the equation f(z, , ..., Z,) = z can be derived from > if and only if 
$(k,, ..., Kn) = k. In the process of derivation we may make use of two 
rules corresponding to the ones given above: (a) in every equation of >° 
we may substitute numerals for the variables; (5) if for any given numerals 
Z 1, «+5 Zn, , Z and function symbol F we have already derived an equation 
F(Z,,..., Z,) = Z, then on the right-hand side of any subsequently 
derived equation we may replace F(Z, , ..., Z,) by Z. 

The precise concept of a recursive function is to be regarded as cor- 
responding to the intuitive concept of a computable function. In particular, 
the functions x”, x!, | x — y| are recursive. 


5.7. Consistency of an Algorithm and Consistency of Mathematics 


The formulas that can be derived by an algorithm consist of rows of 
single symbols (not necessarily letters in the ordinary sense of the word) 
from a given alphabet. In general, it will not be possible to derive all the 
various formulas that could be constructed from this alphabet. There will 
be at least one formula A whose derivability is ‘undesirable.’ Such a 
formula_might, for example, be Px ~ — Px (cf. §4.7), or x =x, or 
| s= ||25 An algorithm K is called consistent with respect to a formula A 
of this sort if A is not derivable. We are speaking here of the syntactical 
consistency already mentioned in §4. A consistency proof for K consists 
in a demonstration that A is not derivable. A consistency proof in the 
constructive sense must employ only self-evident assertions and must 
avoid all ideas that are problematical from the semantic point of view, 
e.g., the idea of the actual-infinite, since such ideas are not accepted by all 
mathematicians. On the other hand, it is considered acceptable to make 
use of inductive proofs concerning the structure of an algorithm. Let us 
give a simple example: the alphabet of the algorithm K consists of the 
two letters o and |. There is a single axiom 


(5.13) O. 
As a rule of inference (with the proper variable e) we take 


(5.14) a , 


1 Let us note the difference between numbers and numerals. It is customary to 
regard numbers as some sort of ideal entities that are represented in writing by symbols 
called numerals, In order to make the discussion more systematic, it is better here not 
to use the ordinary Arabic numerals for the numbers but, as was mentioned above, 
to represent the number 4, for example, by the numeral 0”. Numbers cannot be 
written down, but numerals can. 

15 x == y means that x and py are the same formulas. 
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In this algorithm the formula | is not derivable: that is, K is consistent 
with respect to |. The proof is inductive: we cannot derive | from (5.13), 
since | is different from o. Also, we cannot derive | from (5.14) since 
every formula that can be derived from (5.14) must consist of more than 
a single letter. 

For many of the important algorithms in mathematics, it has been 
possible to derive their consistency by “‘acceptable’’ methods of this sort, 
sometimes called “‘finitary.”” Moreover, the researches of Hilbert, Gentzen, 
Ackermann, Schitte, Lorenzen, and others have proved the consistency 
of the so-called ramified analysis closely connected with constructive 
mathematics (cf. §1, Nr. 4 and 5). On the other hand, no one has yet 
succeeded in proving the consistency of classical analysis. 

Even though algorithms are of great importance for mathematics, it is 
still the opinion of many researchers that the whole of mathematics itself 
cannot be regarded as an algorithm (cf. “‘Incompleteness of Arithmetic,” 
§10.5). In this case it makes no sense to speak of the syntactical consistency 
of mathematics as a whole. 

For the constructivist school of mathematics, as represented, for 
example, by Curry and Lorenzen (§1.4, 5), all mathematical theorems 
are evident in the above sense. For the adherents of this school the whole 
of mathematics is a priori as reliable as a consistent algorithm. 


Exercises for §5 


1. From the functional equations 
(5.9) 
(5.10) 
(5.11) 
(5.12) 


and the rules given in the text prove that 
OP QO (3-2 = 6). 
2. Give recursion equations for the function x”. From them prove that 
(0”)0” = Q”” (2? = 4). 
3. Introduce the functions 
x! 
(x) (predecessor of X; 0 if x = 0) 
a -— b (difference; 0 if a < b) 


by recursion equations. Assume a+b, a+b and functions already 
defined. 
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4. Show that the calculus determined by the equations (5.9), (5.10), 
together with the rules (a) and (b) given for them in the text, is syn- 
tactically consistent. 


5. If a set of natural numbers arranged in order of increasing magnitude 
is recursively enumerable, then it is also decidable. 


Bibliography 


For recursive functions and the concepts related to them see Davis [I], 
Hermes [1], and Kleene [1]. 


6. Proofs 


6.1. Rules of Inference and Proofs 


Let there be given a system of axioms, say the axioms of Euclidean 
geometry. The theorem of Pythagoras is a consequence of these axioms, 
but that fact is not immediately obvious; it becomes so only step by step. 
Each step consists of the application of a rule of inference. A rule of 
inference is an instruction concerning a possible transition from certain 
preceding formulas (the premisses) to a subsequent formula (the con- 
clusion). A simple example with two premisses is the modus ponens (the 


rule of separation) 
H—->@ 


(6.1) line 


This rule enables us to make the transition from the two premisses 
H — © and H to the conclusion ®. An inference is a transition in accord- 
ance with a rule of inference. A proof (derivation, deduction) is a finite 
sequence of expressions each of which (unless it is an axiom) can be 
derived from the preceding expressions by means of the rules of inference. 


6.2. A Complete System of Inference 

Although it is clear that there exist an infinite number of different rules 
of inference, in actual practice the mathematician makes use of only a very 
few of them, which recur again and again in many different arrangements. 
So we naturally ask whether it is possible to find a finite system of rules of 
inference by means of which we can deduce a// the consequences of an 
arbitrary system of axioms. Such a system may be called a complete system 
of rules of inference, and it is one of the basic discoveries of modern logic 
that, within certain limitations, complete systems of rules of inference 
actually exist. The limitations in each case depend on how much the given 
system of logic is able to express. For example, a complete system can be 
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found if we confine ourselves to axioms and to consequences expressible in 
the language of predicate logic, which is sufficient for many parts of 
mathematics. But the situation is different if we admit quantification of 
predicate variables. See the “Incompleteness of the Extended Predicate 
Logic’’ (§10). 

The fact that within the framework of predicate logic every consequence 
can be derived by a finite system of rules of inference is described by 
saying that the predicate calculus determined by these rules is complete. 
The existence of such a calculus was foreseen by Leibniz in his demand 
for an ars inveniendi; to a certain extent it was experimentally verified by 
Whitehead and Russell in their monumental work Principia Mathematica 
(1910-1913) (based on the preliminary work of various logicians; in 
particular, Boole’s Algebra of Logic, 1847), and finally, in 1930, it was 
proved by Gédel in his famous Gédel completeness theorem. 

In the terminology of the foregoing section the Gédel completeness 
theorem asserts the existence of an algorithm for recursively enumerating 
all consequences of an arbitrary system of axioms that can be stated in 
the language of predicate logic. 


6.3. The Complete System of Rules of Inference of Gentzen 
(1934) and Quine (1950) 

Several different complete systems of rules of inference are known today 
but here we must restrict ourselves to the one which, since it is closely 
related to the ordinary reasoning of mathematicians, is called the “system 
of natural inference.” The advantage of close relationship with ordinary 
mathematical practice is gained at the expense of unnecessary loss of 
symmetry and formal elegance, so that in purely logical investigations it 
is customary to use other systems. 

For a greater clarity let us make some preliminary remarks. A charac- 
teristic feature of mathematical reasoning is the use of assumptions. 
Among the assumptions introduced during the course of a proof in any 
given mathematical theory we must include the axioms of the theory, 
or at any rate those axioms that are referred to in the proof. But in 
addition to the axioms, a mathematician will often introduce further 
(unproved) assumptions, on the basis of which the proof then proceeds. 
Of course, all assumptions that are made in this way must later be 
eliminated. 

A special case of the introduction of assumptions occurs in an indirect 
proof. Here we arbitrarily assume the negative of the theorem to be 
proved.!* Then in the course of the proof we try to reduce this assumption 


16 In case we wish to prove 4H, we arbitrarily assume the proposition H (cf. the last 
example in §6.6). 
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ad absurdum, that is, we try to deduce from it two mutually contradictory 
results. It should be emphasized that, at least from the point of view of 
the classical logic under discussion here, an indirect proof is just as good 
as any other (although the situation is different for other schools of logic; 
see §6.7). 

Another characteristic feature of mathematical reasoning is the 
introduction of variables for entities whose existence is already known. 
Consider, for example, two nonparallel lines g and / in a plane. We know 
that g and A have at least one point in common (in particular if the two 
lines coincide), and then the mathematician will say something like, 
‘let a be a point common to the two lines.”’ But the variable a here has 
no independent significance; it is meaningful only with respect to the 
proposition asserting its existence, a fact that must be kept in mind during 
the course of the proof. Variables of this sort also occur in the system of 
Gentzen and Quine, where they are called flagged variables. In order to 
avoid the danger of misunderstanding and consequent mistakes, it is not 
permissible to introduce the same variable for different entities during the 
course of a proof; this restriction is called the restriction against flagging 
the same variable twice. In general, a flagged variable will ‘‘depend” on 
other variables that have already appeared in the proof (in our example, 
a depends on g and A), in which case we stipulate that no variable may 
depend (even indirectly) on a second variable which in turn depends on 
the first; this restriction is called the restriction against circularity. 


6.4. List of the Rules of Gentzen and Quine 


For an explanation of these rules see §6.5, and the example of §6.6. 
Most of the rules have to do with the introduction or the elimination of a 
logical constant. 


Two further rules without premisses (cf. §6.5): 


a. the rule for introduction of assumptions, 
b. the rule of tertium non datur. 


6.5. Explanation of the Rules 

By a proof we shall mean, here and in the rest of this section, a finite 
sequence of expressions that follow one another according to these two 
rules. Here it must be emphasized that this precise definition of a proof is 
altogether necessary in studies of the foundations of mathematics, in 
contrast to the situation in ordinary unformalized mathematics, where it 
is not customary to state the rules of inference being used. The lines in a 
given proof can now be numbered. Each line consists of finitely many 
assumptions (perhaps none) and an assertion. As a typical example we 
take the rule for a-induction. This rule allows us to proceed from a line 
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numbered ij and a line numbered k (i 2 k) to a line numbered / (with 
| > i,1 > k), whose assertion is the conjunction of the assertions of the 
ith and kth lines, and whose assumptions consist of the “juxtaposition” 
of the assumptions of the ith and kth lines; 1.e., an expression is an 


The Rules of Gentzen and Quine for the Predicate Calculus 


Logical Constant Introduction Elimination 
A H HvAO Ha®O 
3) H 6) 
Ha@O 
Vv H 6) Hv@ 
Hv@ Hv@ H > Z 
0—+-Z 
Z 
o H+ @O | Hoe O HoH *@ 
@ +H H>0O @O—H 
Ho @ 
—J7 | H—>@O H 
—H 6) 
—> 2) H—>@O 
H—>@™ H 
0 
7 V Qu VH1 7 
a ) 
A @ 18 A H18 


17 Assumption: H becomes @ by free renaming of x to a variable y, and conversely 0 
becomes H by the reverse renaming of y to x. The variable y must be flagged with respect 
to the free variables occurring in A, H and V,H. 

18 Assumption: H becomes @ by free renaming of the variable x to a variable y. (An 
exact definition of free renaming cannot be given here. We shall merely give a typical! 
example: H = (A, Pxu a Qxy) becomes @ = (A, Pyu A Qyy) by free renaming of 
x to y.) Here » may also coincide with x. 
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assumption of the /th line if it is an assumption of the ith or of the kth 
line (the order in which the assumptions are written is of no importance, 
and an assumption which occurs several times may be written only once); 
schematically: 


Line Number Assumptions Assertion 
i Ay5 2A; H 
k Be eieia Be 0 
| Ay 3 sei Ags Bry vey By Ha @ 


When use is made of the rules of V-elimination (elimination of the 
existential quantifier) or of V-introduction (introduction of the 
universal quantifier), it is mandatory to flag a variable with a statement of 
the variables on which it depends. For example, if u,, ..., u, are the free 
variables in V, H, then in making use of the rule of V-elimination we 
must write the new line / as follows: 


Line Flagged Variables Assumptions Assertion 


| V(uy , «2.5 Up) Ay scien Ay —. 


The procedure for the rule of A-introduction is analogous.?® 

The rule for —-introduction may also be called assumption-elimination: 
for if H is an arbitrary assumption of the initial line (see the list of rules), 
then H will no longer occur as an assumption in the final line of the proof. 
In contrast to the rules described up to now, which allow us to pass from 
one, two, or three lines of the proof to a new line, the two rules of 
assumption-introduction and tertium non datur allow us to write down a 
line in the proof without making use of any preceding line. The rule of 
assumption-introduction consists simply of writing down an arbitrary 
proposition both as assumption and as assertion: 


Line Number Assumptions Assertion 


l H H 


In this rule the necessity for flagging is perhaps not immediately obvious: let us 
motivate it by the remark that the rule for A-introduction is dual to the rule for 
V-elimination. 
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The rule of tertium non datur allows us to write down any particular case 
of tertium non datur without assumptions: 


Line Number Assumptions Assertion 


l — Hv—H 


The last line of a finished proof must not contain any flagged variable 
as a free variable, since such a variable has no independent significance. 
Also, after constructing such a proof, we must verify that we have observed 
the restrictions against flagging a variable twice and against circularity.”° 


It can be proved that the assertion of the last line of a finished proof is a 
consequence (in the sense of §3.6) of the assumptions of the last line of the 
proof. Conversely, if © is a consequence of H,,...,H,, then there always 
exists a finished proof with a last line whose assumptions are H,, ..., H, 
and whose assertion is O. 


In the present sense of the word, a proof is analogous to a schematic procedure 
for making a computation. Thus the process of proof has all the advantages 
and disadvantages of other schematic procedures that have been developed 
in mathematics. The advantage lies in the fact that in a mechanical procedure 
of this sort it is no longer necessary to do any thinking, or at least not as much 
as before, although this advantage can only be gained at the cost of considerable 
training in the art of carrying out the procedure. The disadvantage of a schematic 
procedure is that the rules which are simplest from the formal point of view 
are not always the ones that are most immediately obvious to the human mind. 

On the other hand, if we wish to explain why exactly these formal rules 
were chosen, and no others, our explanation must be based on arguments 
whose meaning is intuitively clear. For lack of space we cannot give a detailed 
explanation here and will merely make a few remarks: the rules for V-introduction 
express the fact that if we have proved an assertion H under certain assumptions, 
then under the same assumptions we may make the weaker assertion H v 9 
or 9 v H. This rule is used in arithmetic, for example, in making approxima- 
tions where we proceed from an already proved assertion of the form x < 1 
to the weaker assertion x < 1 (i.e., x < 1 v x = 1). The rule for —-elimination 
expresses the following fact: if the assertion H follows from certain assumptions, 
and the assertion — H from certain other assumptions, then the two sets of 
assumptions taken together form an inconsistent system from which an arbitrary 
proposition 9 follows trivially. The rule for —-introduction means only that 
if a proposition 9 follows from certain assumptions, including in particular 
the assumption H, then the proposition if H then 9 follows from the same set 
of assumptions excluding H. 

We now give two examples of proofs. The reader is advised to direct his 
attention less to the actual meaning of the steps in the proof than to the question 


20 The restriction against flagging a variable twice prevents us from proceeding from 
V, H through H to A, H, since x would have to be flagged twice; and even if we introduce 
a new variable y, we cannot pass from V, H to A, H without double flagging. 
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whether the above formal rules have been correctly applied. Of course, this 
will cost him some effort, comparable to the effort required when one undertakes 
for the first time to solve a quadratic equation by some formal procedure. 


6.6. Two Examples of Proofs 


We begin with a proof that H follows from — — H. This fact, which is 
valid only in classical logic, makes use of the tertium non datur. In the 
right-hand column we indicate the rule and the preceding lines that 
justify the step taken in each line. 


Line Number Assumptions Assertion Rule Used 
1 +-H s=-—H introduction of assumption 
2 +H +H introduction of assumption 
3 44H, H H —-elimination (2, 1) 
4 +H —=H-H elimination of assumption (3) 
5 H H introduction of assumption 
6 H—>H elimination of assumption (5) 
7 Hv —H tertium non datur 
8 s+AH H V-elimination (7, 6, 4) 


Since we have used only rules from the propositional calculus, there has 
been no need to flag variables. 

In the same way we can prove the four rules of contraposition, by which 
we mean the following steps: (1) from H > @ to — 8 > — H, (2) from 
H>— 89 to 0>—H, (3) from —H> 9 to —O—H, (4) from 
7 H— — @ to @ — H.22 

As a second example (see page 48) we wish to give part of an indirect 
proof and choose for this purpose the proof of the irrationality of +/2. 
We use the variables p, q, r, 5, t, u, x, y, Z for positive integers and take 
advantage of the fact that a rational number can be represented as the 
quotient of two natural numbers which have no factor in common and 
thus, in particular, are not both even. Then our problem is to prove the 
proposition 


(6.2) VV 2g? = pha (2| pa2|q)). 


Since 2| p is only an abbreviation for V,2s = p, we may rewrite (6.2) 
in the form 


(6.3) = V V (2q7 = p? aA (V 2s = pa V 2s = Q)), 
p@ S s 
21 A further example is given in §11.2. 


22 The last two rules are not valid in the logic of intuitionism, which also rejects the 
step from — — H to H. 


Line Number 


C0 oA HAWN WN KH OC OHA DUN WYN = 


NN KN WY 
wm NY — © 


24 


Flagged Variables 


qp) 


S(p) 


Assumptions 

H, 

Hy 

Hy 

Hp 

Hy 
Ay , oy An 
Ay, wey An 

Hy , A, ---> An 

Ho , Ai, «+9 An 

Ho 3 AryengAs 
Ay, «5 An 
Ay , 5 An 
Ay, vs An 
Aj SA} 

Hy, Ai, ---» An 

Hy, A, , ---> An 
Ain cgAn 

Hos AnysnsAn 

Hy, A, «+» An 

H, 

H, , A1, ---, An 
Ay, «5 An 
Ay g -+15 An 


Assertion 


VV (2qg2 = p?a 4 (V2s = pa V2s =4Q)) 
V (242 =panr (V 2s = pn V 2s = q)) 
: 2 = pra (V2s=pa V2s = 4) 
2q? = p? 
V 2r = prs) 
A (V 21 => V 2s = Hn) 
V2t= p?> Vis=p 
V 2s =p 
2s =p 
2q2 =p? ads =p 
ANA (2x = aA2zZ = yp->2z = x) 
NA (24? = jy? a 2z7 = y 722 = Pye 
"A Qq? = pt a2z = p> 22 = @") 
2q2 = p?a2’s = p—>2s*=¢ 


2s? = gq 

V2t = ¢@ 
V2r= q+ Vis=q 

V2s=q 


V2s =paV2s=q 
4 (V 2s = pa V2s = 4) 
= Hy 
Hy, > 7 Ho 
H, > Hy 
— Hy 


Rule Used 


introduction of assumption 
V-elimination (1) 
V-elimination (2) 
A-elimination (3) 
V-introduction (4) 
(arithmetic) 

A-elimination (6) 
—>-elimination (7, 5) 
V-elimination (8) 
a-introduction (4, 9) 
(arithmetic) 

A-elimination (11) 
A-elimination (12) 
A-elimination (13) 
-»-elimination (14, 10) 
V-introduction (15) 
A-elimination (6) 

- >-elimination (17, 16) 
a-introduction (8, 18) 
a-elimination (3) 
—-introduction (19, 20) 
elimination of assumption (21) 
elimination of assumption (1) 
—-introduction (23, 22) 


23 Strictly interpreted, the rule of existence-introduction in §6.4 allows us to go from V;2f = p? to 2q? = p® by introducing a suitable 
variable for the variable ¢. But that is not exactly what we are doing here, since we must replace t by q?, and q? is not a variable. The 


difficulty lies i 


we have used t 


ie fh 


functional notation, which, in principle, we could have avoided, a 
24 Here we have replaced x by g?. Cf. the preceding note. 


-fact that for simplicity in the above example, and for consistency with the nomenclature of ordinary mathematics, 
s we have seen in §2.5. 
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which we shall now abbreviate to — H, . Here the axioms of arithmetic are 
indicated simply by 4,,...,A4,. From 4,,..., A, it follows that an 
arbitrary positive integer wu is even if its square is even. We have made use 
of this fact in the second line and, strictly speaking, we should give a 
complete proof of it. The same remark applies to line 11, which expresses 
an elementary result from arithmetic. 

It is easy to verify that we have now constructed a finished proof in 
which we have respected the restrictions against flagging the variable 
twice and against circularity. 

From this example it is clear that proofs in the precise sense in which 
we are now using the word are generally much longer than the “proofs” 
of ordinary mathematics. This fact should cause no surprise, since we 
are employing only a few rules of inference of a very elementary character. 


6.7. Recursive Enumerability and Decidability in the Predicate Logic 


The calculus discussed above has provided us with a procedure (an 
ars inveniendi) for recursively enumerating the theorems of any theory 
that is axiomatized in the language of the predicate logic. The verification 
of the correctness of any proof can be carried out, at least in principle, 
by a machine, since we are dealing here only with simple formal 
relationships among rows of symbols. Thus, it is a decidable question 
whether or not a given sequence of expressions is a proof. 

But it must be emphasized that such a calculus does not enable us, for 
an arbitrary finite system of axioms % and an arbitrarily given expression 
H, to decide whether or not H follows from YI. To decide such a question 
would require an ars iudicandi in the sense of Leibniz, and since 1936 
it is known (Church) that for the predicate logic such a decision procedure 
cannot exist. 


6.8. Nonclassical Systems of Rules 


As was pointed out in §6.5, the rules given in §6.4 for the predicate logic 
can be established semantically. But the nonclassical conceptions of logic 
can lead to corresponding systems of rules that are not necessarily 
equivalent to the system described here. For example, the rule of tertium 
non datur is not valid for a potential interpretation of infinity (cf. §1.4). 


Exercise for $6 

Let the axioms for a group be given in the following form: 
M (Multiplication) A A V xy =2 
A (Associative law) A A A x(yz) = (xy)z 
U (Unity) A x= 2 
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J (Inverse) A A xy =0 
E, (Equality) T=T 
E, (Equality) H(T,) 4 T, = T,—> H(7;) 


Here 7, 7, , T, denote terms, e.g., ab, (ab)c and so forth, and H(7,) is 
an arbitrary term-equation containing the term 7, . Also, E, and E, are 
axiom-schemes (4.1). The axiom E, can be represented more conveniently, 
and equivalently, by the additional rule of inference 


H(T,), T, cm T, 


Ey HT) 


From these axioms construct a proof for the propositional form 
AV ba=e 
a b 


(existence of a left inverse). 
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7. Theory of Sets 


7.1. Introductory Remarks 

Many definitions and theorems contain such expressions as set, totality, 
class, domain, and so forth. For example, in the definition of a real number 
by means of a Dedekind cut (see IB1, §4.3) the totality of the rational 
numbers is divided into two non-empty classes, a first or lower and a 
second or upper class. An ordered set (cf. §7.2) is said to be well-ordered 
if every non-empty subset contains a smallest element. Again, we may 
visualize a real function as the set of points of a curve and may speak of 
its domain and range (§8.3). Finally, we have already spoken of a domain 
of individuals in our definition of the concept of a mathematical con- 
sequence (§3.6). 

The concept of a set, which is thus seen to be of fundamental importance, 
was for a long time regarded as being so intuitively clear as to need no 
further discussion. Cantor (1845-1918) was the first to subject it to 
systematic study. His definition of a set (not a definition in the strict 
mathematical sense of the word but only a useful hint in the right direction) 
runs as follows: A “set” is any assemblage, regarded as one entity M, of 
definite and separate objects m of our perception or thought. 
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The Cantor theory of sets developed rapidly and soon exercised a 
great influence on many branches of mathematics, the theory of point 
sets, real functions, topology, and so forth. 

But with the discovery of contradictions—the so-called antinomies of 
the theory of sets (cf. §7.2 and §11)—the foundations of the theory, and 
therewith of the whole of classical mathematics, were placed in jeopardy. 
The discussion of this problem, which is still continuing, has contributed 
in an essential way to the development of modern research on the foun- 
dations of mathematics. The various schools of thought have made 
several suggestions for the construction of a theory of sets; let us 
mention a few of the most important. 

The naive or intuitive theory of sets simply attempts to avoid the 
introduction of contradictory concepts. Frege and Russell tried (logicism) 
to reduce the theory of sets to logic. Zermelo, von Neumann, and others 
have introduced systems of axioms for the theory of sets from which it is 
possible to deduce many of the theorems of the naive theory. The con- 
sistency of these systems of axioms remains an open question (cf. §4.7). 
Still other authors insist that a set must be explicitly definable by a 
linguistic expression (a propositional form with a free variable), which 
must then satisfy certain additional conditions, depending on the school 
of thought to which the author belongs. 

In Sections 7, 8, and 9 we deal chiefly with the naive theory of sets; 
as for the axiomatic theory, we confine ourselves to a brief description 
of one of the various systems in use. The three sections are closely related 
to one another in subject matter and are separated here only for 
convenience. 


7.2. Naive Theory of Sets 


The Cantor definition of a set makes it natural for us to gather into 
one set all the entities that have a given property; for example: (1) the set 
of chairs in the room (these are objects of our perception), or (2) the set 
of even numbers (objects of our thought). To denote variables for sets 
and their elements we use the Latin letters a, b, c,... M, N,..., and so 
forth. To express the fact that y is an element of x we write y € x, and for 
— ye x we also write y ¢ x. It is possible for one set to be an element of 
another set. Sets that contain the same elements are regarded as being 
equal, i.e., 


(7.1) \(x€acmxeb)>a=b. 


This requirement is called the principle of extensionality. Thus a set is 
determined by the elements “contained” in it, by its content or extension. 

The property of being a prime number between eight and ten defines 
a set that contains no element. By the principle of extensionality there can 
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be only one such set, which is called the empty set, and is here denoted by 0, 
although some authors use the special symbol 9. 

Let us now define the simplest set-theoretic concepts: A set a is called a 
subset of b (a is contained in b, aC b) if Ay (xEea—>xeEb). If aX b, then 
a is a proper subset of b or is properly contained in b(a C b). The set c is 
called the union of a and b(c = aU BD) if A, (xEcwxeav xed). The 
set c is the intersection of aand b(e =anNnb)if A, (xEcoxEeanxeEb). 
Two sets a, b are disjoint if they have no element in common, i.e., 
a b = 0. The complement £5 of a set x is the set of all elements which 
are not elements of x. But here we must be careful, since the complement 
of the empty set is then the “universal set,” which easily leads to contra- 
dictions (cf. §7.5). These contradictions can be avoided if we consider 
only subsets of a certain fixed set M. Then # is the set of y with 
yeMnayéx. 

It is convenient to illustrate these concepts with sets of points in the 
plane: 


ey CHS = 
a te 


acb aUb a and b disjoint M 
Fig. 1 Fig. 2 Fig. 3 Fig. 4 


By the power set Sa of a set a we mean the set of all subsets of a: 
A, (x € Pax Ca). The set that contains x as its single element is 
written {x},26 and correspondingly {x, y} is the set containing exactly the 
two elements x and y, and so forth. For example, {0} contains exactly 
one element, namely the empty set, whereas 0 contains no element at all. 
In a set-theoretic treatment of functions (cf. §8) an important role is 
played by the ordered pairs <x, y>,”” defined by 


(7.2) <x, y> = {fx}, (x, y}}. 


From <x, y> = <u, v> follows x = ua y = v. Thus the order of the 
components in an ordered pair is significant.”° 


25 The complement of x is often denoted by “‘x’.” 

26 {x} and x differ from each other, since in general x does not have x as its only 
element. Nevertheless, in cases where no confusion can arise, it is customary to write 
x for {x}. 

27 Ordered pairs are also denoted by (x, y). 

28 For sequences of symbols the construction of ordered pairs may be carried out 
simply by means of juxtaposition and a suitable symbol! for separation. 
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It is easy to prove the following rules, which lead us to speak of an 
algebra of sets (cf. §9) or of a field of sets. 


Laws for CQ and U: 
(1) The commutative laws: 
anb=bna, aUb=bUVa. 
(2) The associative laws: 
an(bOc)=(anb)nec, aU(bUc)=(auUb)Vc. 
(3) The absorption laws: 
aN (aU b) =a, aU(an b)=a. 
(4) The distributive laws: 
an(bUc)=(anb)U(anc), 
aU(bnc) =(aub)n(ave). 
Laws for C: 
(1) The reflexive law: 
ala. 
(2) The identitive law: 
aChbabCa>a=b, 
(3) The transitive law: 
aCbabCce—acCe. 
Thus, the relation C is a partial ordering (in the sense of §8.3). 
Laws for ©, O, VU: 
(1) aCbmanb=a, aCbmobVUa=b, 
(2) aChbncwmalbaaCe, avbCewmaCcandbCle. 
Laws for complementation (a, b are subsets of m): 
(I) a=beoa=6, (2) (4) =a, 
(3) aCbobCa, (4) 0=m, m=O, 
(5) @Nb)=aUb, (@ub=anéb, 
(6) alCbmwanb=0, alCbeoaubdb 


tl 
3 


Laws for 0 and m (a is a subset of m): 


(1) aU0=a, anm=a, 
(2) and0=0, aum=m. 


54 PART A FOUNDATIONS OF MATHEMATICS 


Up to now we have introduced the concept of union for two sets only, 
but it is often necessary to consider the union of arbitrarily many sets. 
Let M be a set of sets. Then by Use x we denote the set of elements y 
belonging to at least one x in M. Correspondingly, as a generalization of 
the intersection of two sets, we write (),,-1 x for the set of those elements 
of y which belong to every x in M. 


7.3. Cardinal Numbers in the Naive Theory of Sets 

One of the most important concepts introduced by Cantor is that of the 
power or cardinality of a set. It represents an extension to infinite sets of 
the number of objects in a finite set. Two sets x, y are said to be equivalent 
(x ~ y) if a one-to-one correspondence can be set up between the elements 
of x and those of y. For example, the set {1, 2, 3} and {0, {0}, {{0}}} are 
equivalent; moreover, the set of natural numbers and the set of squares are 
equivalent, as is shown by the following correspondence between them: 


0 1 
eae 
0 1 


kh ON 


3.4... 
ane 
9 16.... 

This example shows that an “infinite” set a can be equivalent to a proper 
part of itself, a property which is usually taken as the definition of infinity 
(Dedekind definition of infinity), The cardinal number X*® of a set x is then 
regarded as representing “‘that which is common” to all sets that are 
equivalent to x. Thus, we might say that the cardinal number of x is simply 
the set of all sets that are equivalent to x, although such a definition is 
problematical on account of its relationship to the universal set. On the 
other hand, among all the sets that are equivalent to x we could choose 
one definite set as a representative of x and then say that this set is the 
cardinal number of x. But the problematical feature of such a definition 
is that we do not know how to decide which set should be chosen as the 
representative. In any case we have 


(7.3) xwyok=f. 


The cardinal number of a finite set can simply be identified with the number 
of elements in the set. 
For all sets, finite or infinite, we have the Bernstein equivalence theorem: 


If xGy and yCz and x~z, then y~rwz. 


An ordering < for the cardinal numbers (cf. §8.3) can be defined by 
setting F< FSV, (yOZax~y), F<FPeECIAKLAS (cf. §7.4). 


oo 
. 


2° Cardinal numbers are also often denoted by “% 
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The cardinal number of the set of natural numbers is denoted by X, 
(pronounced aleph-zero). If # < &,, the cardinal number < is said to be 
finite, but if £ => &,, then # is transfinite. If ¥ = &,, then & is countable, 
and if ¥ < No, then % is at most countable.*© A transfinite cardinal number 
that is not countable is said to be uncountable. A set is called countable, 
at most countable, or uncountable if its cardinal number has the corre- 
sponding property. Finite cardinal numbers correspond to finite sets, and 
transfinite cardinal numbers to infinite sets. 

The set of rational numbers is countable. The truth of this assertion is 
evident from the following schema in which every ‘‘positive” rational 
number occurs at least once (first Cantor diagonal procedure): 


O>—1 253 455-. 
ores 2% 3 
gXEAR SE 8 
L 
t7AaeGt F 4 


a od 


The existence of uncountable sets was first proved by Cantor by his 
second diagonal procedure: the set of real numbers « with 0 <a <1 is 
uncountable. Proof: let us assume that we have set up a one-to-one 
correspondence between these numbers and the positive integers: 


Oy = 0. Ay Qy9015 °° 
Og = 0. Ag1A e000 °° 
Og = 0. As Ag0Q55 °° 


# 8 © © © © © 8 ee 


Here the real numbers have been written as infinite decimals, so that 
0 < a;, < 9. Now let us form the number «’ = 0, ajaja4,..., where a; = 1 
if a, A 1, and a; = 2 if a; = 1. Then «’ differs from every number 
listed above in at least one decimal place, since it differs from «,, in the 
nth place, and thus «’ is not included in the list. Since 0 < a’ < I, our 
assumption is wrong and the theorem is proved. 

The correspondence set up in Figure 5 shows that the set of all real 
numbers, often called the continuum, has the same power as the set of 
real numbers in the interval just considered. 


°° The terms countable and countably infinite are often used in the sense of our 
“at most countable” and “countable,” respectively. 
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Fig. 5 


“1 ce] 1 2 


7.4. Ordinal Numbers in the Naive Theory of Sets 


A set x for which an order (cf. §8)has been defined is called an ordered set. 
Two ordered sets that can be put into one-to-one correspondence with 
each other with preservation of the order (so that they are isomorphic 
in the sense of §8.4) are said to be similar. By the order type | x | we mean 
“that which is common” to all sets similar to the given ordered set x 
(cf. the remarks on the concept of a cardinal number in §7.3). If the 
ordering of x is a well-ordering in the sense of §8.3, then | x | is an ordinal 
number. For the ordinal numbers we can define an ordering >, which turns 
out to be a well-ordering, by setting| x| <|y|#=V,(zGya|x|=|{z]) 
A |x| |y|. For every ordinal number f the set of ordinal numbers 
with « < 8 in the ordering < is itself a representative of 8. The well- 
ordering theorem, which can be proved on the basis of the axiom of 
choice (cf. §7.6), states that every set can be well-ordered. Only by means 
of this theorem can we prove that the relation < defined for the cardinal 
numbers in §7.3 is an ordering and in fact a well-ordering. 

A non-empty set S of ordinal numbers is called a number class if (1) any 
two members of the set are equivalent (§7.3), and (2) every ordinal that is 
equivalent to S belongs to S. Thus, every cardinal number determines a 
number class. To every finite cardinal number corresponds exactly one 
ordinal number, so that the corresponding class has only one element. 
But the number classes corresponding to transfinite cardinal numbers have 
infinitely many elements. 

The natural numbers can be identified with the finite ordinal numbers, 
or also with the finite cardinal numbers. Then the empty set 0 corresponds 
to the number 0, the class of sets with a single element to the number 1, 
and so forth. The cardinal number of the set {0, ..., 2 — l} isn. In this way we 
can construct a theory of natural numbers on the basis of the theory of 
sets; and in particular, we obtain a model for the Peano axioms (cf. §10). 

If to a representative a of a given ordinal number we adjoin another 
element x, which thus becomes the “‘last’? element in the sense of the 
ordering, the set b = a U {x} thus created represents an ordinal number 
| 6 |, which is called the successor of | a| and is denoted by | a |’. Thus 
there is no ordinal number between |a| and |a|’. Ordinal numbers 
(except 0) which, unlike | b |, have no immediate predecessor are called 
limit numbers. 
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Every non-empty set of ordinal numbers contains a smallest element 
(since the ordinal numbers are well-ordered). Thus we may state the 
principle of transfinite induction [a generalization of induction for the 
natural numbers (cf. §10.2)]: if w is a well-ordering for a set a, then a 
property H holds for all x € a if it satisfies the following conditions: 


(1) H holds for the w-smallest element of a. 


(2) If H holds for all x that are w-smaller than y(y ea), then H also 
holds for y. 


The ordinal number of the set of natural numbers, well-ordered in the 
usual way, is denoted by w, which is thus the smallest transfinite ordinal 
number. For a general discussion of the transfinite ordinal numbers, 
cf. IB1, Appendix. 

Functions whose domain is a transfinite set of ordinal numbers are 
often defined inductively by means of three conditions; for example, 
as follows (a, 8 are arbitrary ordinal numbers, A is an arbitrary 
limit number and limg., f(a, 8) is the smallest ordinal number y with 
I (a, B) € y for all B € A): 

(I) fla, 0) = a, 
(2) fla, B’) = fla, By’, 
(3) f(a, A) = lim fla, 8). 


This is not an explicit definition, since in (2) and (3) the symbol ay fed 
to be defined occurs on the right-hand side, but by transfinite induction 
we can show that there exists exactly one function f with the properties 
(1), (2), (3), and then we can write « + B for f(a, B). A schema of the 
form (1), (2), (3) is called a transfinite inductive definition. If condition (3) 
is omitted, the result is a recursive definition, for functions whose argu- 
ments are natural numbers. For the justification of such a recursive 
definition we need only the usual complete induction (cf. §10.2). 


7.5. Antinomies in the Naive Theory of Sets 

It is easy to show that the 1€ power set of any set x has a greater cardinal 
number than x itself: % < Px. For finite sets x we have Px = = 2*, which 
leads us to write 2% for Bx i in the case of infinite sets as well. The power 
of the continuum is 2%°. If we form the set A of all sets (the so-called 
universal set), we first of all have A < PA. On the other hand 4 is 
certainly equivalent to a subset of A, in view of the definition of A; thus 
PA < A, in contradiction to the fact that < is an ordering. This is the 
antinomy of the universal set. 

Another example of a contradictory concept is the set Q of all ordinal 
numbers (antinomy of Burali-Forti). Like every set of ordinal numbers, 
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this set is well-ordered by <, and thus it has an ordinal number | 2 |. 
By the definition of a successor we have | 2 | < | |’, but by the definition 
of 2 we also have | 2 |’ < | 2|, in contradiction to the fact that < isa 
well-ordering. 

These examples show that caution must be exercised in the formation 
of sets. (Cf. also the Russell antinomy in §11.) 


7.6. Axiomatic Theory of Sets 

The antinomies of the naive-set theory mostly arise from the fact that 
arbitrary properties, described by propositional forms H(x), are admitted 
for the definition of sets. Thus the trouble arises from assuming that for 
every propositional form H(x) there exists a set a described by the axiom 
schema A,(xéa<>H(x)). In the axiomatization of von Neumann, 
Bernays, and others, to which we now turn, this axiom scheme (axiom 
of comprehension) is suitably restricted. 

The system deals with objects x, y, z, ..., called classes, between which 
a two-place relation € can exist. Thus x é y is read: class x is an element 
of class y. There is no formal distinction between classes and elements. 
Certain classes are called sets: namely those which are elements of at 
least one class 


(7.4) Mx <> V xEuU. 


Our first task is to define equality of classes. It is clear that two classes 
may be regarded as identical if (1) they contain the same elements and if 
(2) whenever either one of them is an element of a class, the other is an 
element of the same class. 


(7.5) a=b<+A(xEacxeEeb)n i Gexer dex). 


For our first axiom we may take the principle of extensionality (7.1) from 
the naive theory of sets: 


(7.6) A (xeacrxeb)>a=b. 


Thus a class is completely determined by its elements. 
Now let H(x) be a relevant propositional form (see §4.5); for example, 
x = xorxeyvxez. The restricted axiom of comprehension is 


(7.7) A (H(x) — Mx) > V A (x eueo H(X)), 


where H(x) does not contain u as a free variable. 
Thus a property H(x) can be used as the definition of a class only if it 
refers exclusively to sets, that is to classes which can be an element of 
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some class [cf. also (7.9)]. Then the class defined by H(x) is uniquely 
determined by (7.6) and can be given a name appropriate to its definition. 

Let us now try to prove, for example, the Russell antinomy (see §11) 
by setting x ¢ x for H(x), so that from (7.7) we obtain 


A (x¢x—>Mx)>VA(xEUGC x EX). 
In particular, for x = u 
(7.8) AN (x¢x— Mx) > V(ucucu€u). 


The right-hand side is obviously false, and therefore, by /ogical rules the 
left-hand side is also false. Thus — A, (x ¢ x Mx), and consequently 


(7.9) y (x ¢ x A — Mx). 


Instead of a contradiction we have obtained the (acceptable) proposition 
that there exists a class x (with x ¢ x) which is not a set. 

Let us now examine certain properties to see whether they are suitable 
for the definition of a class. 


(1) Mx for H(x). The premiss for (7.7) then reads A, (Mx —> Mx) and 
thus is satisfied. Consequently, there exists a class A which includes ail sets 
and which we therefore call the universal class: 


(7.10) xeEA<+ Mx. 


(2) x = x for H(x). Because of A, x = x, the proposition 
Az (x = x — Mx) would then lead to A, Mx, which contradicts (7.9). 
Thus there is no class that includes all classes as its elements, and in this 
way we have avoided the antinomy of the universal set. 

(3) x ~ x for H(x). This expression is always false, so that we always 
have H(x) — Mx. Thus x + x defines a class which obviously contains 
no element: it is the empty class 0. 


(4) xeyvxez for H(x). Here Mx follows from xey and also 
from x € z, so that the premiss of (7.7) is satisfied. Thus H(x) defines a 
class that depends only on y and z, namely their union y U z. 


Other classes can now be defined as in the naive theory of sets; for 
example, the intersection a \ b of two sets a and b, the class containing 
one element {a}, and the class of pairs {a, b} and <a, by. The theorems in 
the algebra of classes can then be proved in the same way as in the naive 
theory of sets and, to a great extent, the theory of cardinal and ordinal 
numbers can be developed analogously. For this purpose we must 
introduce step by step the following axioms, which for the most part 
require that certain classes shall be sets. 
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The axiom for the empty set: MO. 

The axiom for sets with one element: Mx > M{x}. 

The first axiom for unions: Mx n My > M(x Uy). 

The axiom of infinity: MNz (where Nz is the class of natural numbers). 

The second axiom for unions: Mx > M \)x (()x is the union of all the 
elements of x). 

The replacement axiom: If the domain of a function (8.3) is a set, then its 
range is also a set. This axiom enables us to prove that for every set a 
there exists a power class a. 

The power set axiom: Mx + M®x. 


The axiom of choice: If a is a class of non-empty sets x, there exists a 
function (§8.3) f such that f(x) € x for all x ea. (Thus from every set 
x ea the function f “chooses” an element /(x).) Here also the axiom 
of choice is an essential instrument in the proof of the well-ordering 
theorem. 


The continuum hypothesis: Between the cardinal number of an infinite 
set x and the cardinal number of its power set {3x there is no other cardinal 
number. The particular case ¥ = Ny Is the special continuum hypothesis. 
From the special hypothesis it follows that every uncountable subset of 
the set of real numbers has the power of the continuum. 


7.7. Independence of the Axiom of Choice and the Continuum Hypothesis 

In §4.7 we have mentioned the Gédel proof of relative consistency. 
Godel’s result can be formulated as follows: let 2 be the system of axioms 
for set theory as stated just above (§7.6), but without the axiom of choice A 
and the continuum hypothesis K. Let it be assumed that Q is consistent 
(although it is still unknown today whether this assumption is true). 
Then — A cannot be deduced from % nor K from & U {4}. 

In 1963 Cohen proved further that (if 2 is consistent) it is also impossible 
to deduce A from Y or K from QU {A}. Thus we have shown (see also 
§4.4) that A is independent of 2 and K is independent of % U {4}. 


7.8. Symbols for Sets 

If we are given a propositional form ---x---, it is convenient to have a 
symbol for the set of those x which possess the property corresponding to 
this propositional form. Several notations are customary in the literature: 


A(x), {x3 xen}, {x | xe}, 


all of which are read: the set of x with the property ---x-:- Let us note that 
the set in question could be denoted just as well by #(---y---); in other 
words, we are dealing here with a bound variable (cf. §2.6). 
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Exercises for §7 


1. Let (a) be the set of rational integers divisible by n. Illustrate the sets 
(3), (6), (9), (15S) by point sets in the plane (as in figures 1-4) in such a 
way that the proper inclusions are correctly represented. What is the 
number-theoretic significance of the various intersections and unions? 


2. Show by dual representations that the set of real numbers x with 
0 < x < 1 has the same power as the set of points <x, y> of the square 
O<x<1;0<y< 1). 


3. Let x’ be defined by xu{x} (cf. 7.4); also let 


C)neN=A [(Oex (rex—>r'ex))>nex]. 


Assume the principle of extensionality, the restricted axiom of com- 
prehension, the axiom of the empty set, the axiom for sets with one 
element, and the first axiom for unions. 

Prove: 

(a) The right-hand side of (*) defines a class N. 

(b) OEN 

(c) EN 

(d) A (xEN->x'EN) 

(e) A (x'eN— xeEN) 

(f) A (x'eN— 0e x’) 
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8. Theory of Relations 


8.1. The Concept of a Relation 


We may consider relations as properties of ordered pairs (§7.2). For 
example, 3 < 4 (3 stands in the <-relation to 4) states that the property 
‘smaller than’’ holds for the ordered pair <3, 4. Or: the point A lies on 
the line g states that the pair <A, g) has the property described by the 
predicate /ies on. 

Analogously, we may regard an n-place relation as a property of ordered 
n-tuples. For example, the expression x + y =z defines a three-place 
relation for the natural numbers. Except when otherwise noted, we shall 
always take the word relation to mean a two-place relation. 
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Relations have the same fundamental importance in mathematics as sets. 
Many of the basic concepts of mathematics are to be defined by relations 
(e.g., function, congruence, order) or at least can be understood in terms 
of relations (e.g., group, lattice, factor group, cf. §8.5). 

For simplicity we here take the naive point of view (cf. §7.1), so that 
relations may simply be regarded as sets of ordered pairs. Thus instead of 
saying: x is in the relation r to y (abbreviated xry), we can equally well 
say: the ordered pair <x, y> is an element of the set r: 


(8.1) xry <> <x, per. 


The elements of the pairs are assumed to belong to a fixed ground set M 
in which the relations are defined, and r,s, t, f, g, A,... are variables 
for them. For example, if M is the set Nz of natural numbers, then <m, n> 
belongs to the relation < if and only if m <n. Thus < consists of the 
pairs <0, 0, <0, 1), ..., <1, 1, <1, 2), .... and so forth. By the first domain 
6,(r) of a relation r we mean the set defined by V, xry, and by the second 
domain 0,(r) we mean the set defined by V, yrx. For example, 6,(<) = 
{0, 1, 2, ...}, (<) = {1, 2, 3, ...}. The set 6,(r) U 6,(r) is called the domain 
of the relation r. 

An important relation is the identity [, defined by xJy <x = y. For 
the class of natural numbers it consists of the pairs <0, 0>, <1, 1>, and so 
forth. The empty or void relation, which contains no pair at all, will be 
denoted here by 0. It is identical with the empty set 0 (§7.2). The universal 
relation, which contains every pair with elements from M, will be denoted 
by 1. It is to be distinguished from the “universal set.’’ Obviously we have 
Ny Ay xOy, Ag Ay xiy. 


8.2. Combination of Relations (Algebra of Relations) 


Since the relations are defined as sets, it is clear what we mean by the 
intersection r (\ s, the union r Us, and the complement 7: 


(8.2) x(r Os) y <> xry A xsy, x(r Us) y-> xry V xsy, 


xr y <> — Xxry. 


Similarly, the inclusion r C s is defined by A, A, (xry — xsy). 

In addition to these purely set-theoretic constructions, there are two 
other important ways of combining relations: the converse relation r and 
the relative product rs. The converse relation is defined by: 


(8.3) XPy <> rx. 


Thus the converse r of r arises from r through “reversal” of all the pairs. 
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The relative product is defined by: 


(8.4) x(rs) p> V (xrz A zsy). 


Thus the relative product rs of r and s arises, roughly speaking, from 
“juxtaposition” of r and s. As may be shown by simple examples, this 
operation is not commutative. For rr it is customary to write r?. Thus, 
if M is the class of natural numbers, we have: 1C <, <N/I=6, 
rl = Ir =r for every r, < = >, 12 =. A set of relations which is 
closed (cf. I1BI0, §2.2) with respect to all these operations is called a 
field of relations. For computation with relations we have the same rules 
as for the algebra of sets (§7.2), and also certain other rules, which are 
easily proved directly from the definitions; for example, 


(rs) =ros, (rUs) =Frus, r=r, (r) =(7) , 


r(s Ut) = (rs) U (rt), ris Ot) C (rs) 2 (rt). 


8.3. Special Properties of Relations 


A relation r is symmetric if A, A, (xry > yrx), a requirement which by 
(8.3) may also be written in the shorter form r Cr. Definitions like this 
last one, which make no reference to the elements of the ground set, are 
often more concise. In what follows we shall give the definitions, wherever 
possible, in both forms, leaving to the reader the task of proving that they 
are equivalent. In the examples, M is the class of natural numbers, unless 
otherwise noted. 

If xrx for all x, the relation r is reflexive (I Cr). Example: x > y. 

A transitive relation is defined by A, A, A, ((xry A yrz) > xrz). (Alter- 
natively written r?C r.) Example: x < yp. 

A relation is identitive if A, A, ((xry a yrx) > x = y). (In the shorter 
form,r Or C1.) Example: x is a factor of y. 

A relation is connex if A, A, (xry v yrx). (In the shorter form, r Ur = I.) 
Example: x < y. 

Relations which are transitive, identitive, and connex are called 
orderings in the sense of < (example: x < y). For orderings in the sense of < 
the requirements of identitivity and connexity are replaced by A, — xrx 
and A, A, (x #4 y— xry v yrx) (example: x < y). 

If we discard connexity altogether, we obtain the so-called partial 
orderings (in the sense of < or in the sense of <), which are sometimes 
called semi-orderings. Examples are: inclusion, and strict inclusion 
(cf. §7.2), for the set of all subsets of a given set. In more recent literature, 
partial orderings in the above sense are sometimes called orderings, and 
orderings are called total or complete orderings. 
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If an ordering < contains no _ infinite ‘‘decreasing’” sequence 
8 <OXg << Xy << Xy (X41 # x;), It is called a well-ordering (of M), 
where a distinction is to be made between well-orderings in the sense 
of < and well-orderings in the sense of <. Thus an ordering is a well- 
ordering if and only if every non-empty subset M, of its field has a minimal 
element in the sense of the ordering, i.e., an element for which there is no 
smaller element in M,. By discarding the requirement of connexity, 
we obtain the partial well-orderings. 

A set M is said to be directed with respect to a relation r if r is transitive 
and if for every x, ye M there exists a zé M such that xrz and yprz. 


8.4. Functions 


An important class of relations consists of the functions, defined by the 
requirement of uniqueness A, A, A, ((xry A xrz) — y = 2). (In the shorter 
form, rr C J.) For functions it is customary to write f(x) = y in place of 
xfy. The function f is a mapping of the first domain 6,( f) onto the second 
domain 6,(f ); if 8.(f) is contained in a set A, we say that f is a mapping 
into A. If (f)x = y, we say that y is the image of x (under f) and that x is 
the pre-image of y. If f is also a function (that is, fC J), then f is a one-to- 
one (invertible) mapping of 0,(f) onto 0,(/), and f is called the inverse 
function of f. Functions whose domain is the set of natural numbers are 
also called sequences. On the basis of the definition (7.1) for equality 
of sets, two functions are equal (or /dentical) if they have the same domain 
and if for every element in that domain the two functions have the same 
values. 

As an example, let us formulate the Dedekind definition of an infinite 
set (§7.3) in the language of the theory of relations: 


Infinite a <> VOffCIA SICA Of) =a nr 0Af)C a). 


In words: There exists a one-to-one mapping of a onto a proper subset of a. 

Two relations r, s are said to be isomorphic if there exists a one-to-one 
mapping fof their fields onto each other such that A, A, (xry © f(x) sf(y)). 

In mathematical literature a function fis often written in the form f(x), 
but this notation is essentially incorrect, since it appears to mean that the 
variable x is free. If we wish to use the variable x as part of the notation 
for a function, we must indicate that this variable is bound. Acceptable 
notations are Axf(x) or x > f(x)').3 


31 The second of these is more common in recent literature, but it is to be noted that 
the arrow here has nothing to do with the symbol for implication in §2.4. 
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8.5. Equivalence and Congruence Relations 


Relations which are symmetric, reflexive, and transitive are called 
equivalence relations (e.g., the identity J), cf. §4.4. They play an important 
role in mathematics, especially in algebra. 

If we assume reflexivity, we may replace the requirement of transitivity 
and symmetry by that of comparativity: x ~zny~z—>x ~y. 

Let ~ be an equivalence relation in M, and let = denote the set defined 
by x¢%<> x ~z. This set is called the equivalence class generated by z 
or corresponding to z. We have 


(8.6) ze 8, 
(8.7) (XEZA VER oxy, 
(8.8) (UEX NUECES) >= PS. 


All the elements of an equivalence class are thus equivalent to one 
another, i.e., they are related by ~. Two equivalence classes are either 
identical or without common element, so that every equivalence relation 
generates a partition of its field M into disjoint classes. Conversely, every 
such partition of M into classes generates an equivalence relation in M; 
for if M is the union of disjoint subclasses, we define: x ~ y <> (x and y 
lie in the same subclass). 

An equivalence relation defined in a ground set M gives rise to a 
process of abstraction (cf. §1.2), which means that elements of the same 
equivalence class are regarded as indistinguishable; in other words, we 
abstract from their distinguishing features. Conversely, every process of 
abstraction in M gives rise to an equivalence relation in the field M. 

If for a ground set M there are given finitely many k-place functions 

Si ++ fn with values in M, then <M, f,, ..., fy> is called an abstract algebra 
(cf. 1B10, §1.2). For example, let there be given a two-place function f; 
whose value for the arguments x, y we shall write in the form x - y. 
Then it is clear that we shall usually be interested in those abstractions 
that preserve the operation x - y; that is, if we denote the new equality 
by ~, we must be able to define #- as x- y. This will be possible if 
(8.9) : : A Aa ~ Xn N Vi ~ Ve) > MX Vi ~ Xe * Vo). 
In this case the equivalence relation ~ is called a congruence relation 
(with respect to the operation x - y). The situation can also be described 
in the following way: a congruence relation is an equivalence relation 
that is consistent with the operations of the abstract algebra. For example, 
in the ring of rational integers, x = y (mod 6) is a congruence with 
respect to addition and multiplication (cf. IB6, §4.1). 
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If the algebra has a unit element e such that A,(x +e = x), then the 
set of x with x ~e forms a subalgebra N, since from x ~e, y ~e it 
follows that x: y~e-e =e. Let x - N be the set of products of x with 
arbitrary elements of N. For every x we have x :NC &. The congruence 
classes (complete classes of mutually congruent elements) form an algebra 
of the same “type.” If A, x -N = &, then N is a “‘normal factor.” In this 
way many algebraic concepts and theorems (e.g., the theorem of Jordan- 
Hdlder; see IB2, §12.1) can be interpreted as concepts and theorems in 
the theory of relations. 


Exercises for §8 


1. Prove (cf. §§8.3 and 8.4) that 


r is reflexive oI[Cr, 

r is transitive orCr, 

r is identitive orarcl, 
r is connex orur=I, 
ris a function orci, 


° ‘ ° v 
ris a one-to-one mapping «rr UrrCl. 


2. State the axiom of choice and the well-ordering theorem in the symbolic 
language developed in §§7 and 8. 


Bibliography 


For the concepts and applications of the theory of relations see Carnap [2]. 


9. Boolean Algebra 


9.1. Preliminary Remarks 

In the present section we are interested in certain phenomena that first 
came to light in the study of the propositional calculus (§2); the fact that 
they are essentially algebraic in nature was first recognized by G. Boole 
(1847). 

Let us consider the one-place predicates P, Q, ... for a fixed domain of 
individuals M (cf. §3). These predicates can be put in one-to-one corre- 
spondence with the subsets p, q, ... of M by assigning x to pif and only if P 
holds for x; that is, 


(9.1) xepe Px. 
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The conjunction of two predicates obviously corresponds to the inter- 
section of two sets; similarly, the alternative corresponds to their union: 


(9.2) xepnge Px a Ox, xepUqe Pxv Ox. 


The distributive, associative, and other laws for and U correspond to 
the same laws for a and v. Negation corresponds to complementation 
(x € p <> — Px), where (cf. §2.2): 


pUp=M, Px v 5 Px © W, 
(9.3) 
pap=d0, Px A Px oF. 


Logical implication corresponds to set-theoretic inclusion: 
(9.4) DpO&qGeoA (Px —> Ox). 


We see that the domain of predicates for M has the same “structure”’ 
as the domain of subsets of M; the two domains are isomorphic. For the 
general study of such domains it is therefore natural to introduce an 
abstract algebra by means of axioms. The system of axioms will be 
autonomous in the sense of §4. 


9.2. Boolean Lattices 
A set M of elements a, b, ... with operations N, U, —is called a Boolean 

lattice if the following axioms are satisfied: 

BO. a b,avub, dare defined for all elements of M and are themselves 

elements of M. 

Sera ion tl 

Bl2. auUb=bvua 

B21. an (bnc)=(anb)ne } 

B22, au(bUc)=(auUb) Vc J 

B31. pe ae 

B32. aU(anb)=a 

B41. an(bUc)=(anb)U(anoc) 

B42. aU(bNc)=(aub)n(avec) 


(Commutative laws) 
(Associative laws) 
(Absorption laws) 
(Distributive laws) 


There exist elements 0 and 1 in M such that for every a in M 


B51. anad=0 


B52. auad=1 (Complementation laws) 


In the set-theoretic interpretation, aM b and aU b are read as inter- 
section of a and b and union of a and b, respectively, and in the logical 
interpretation, as a and b and aor b. This system of axioms is denoted by B. 
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Looking through the list of axioms in B, we see that for every axiom 
there exists a dual axiom, formed by interchanging N with U and 0 with 1. 
Thus for every theorem there is also a dual theorem, whose statement 
and proof arise from the given theorem by these interchanges (principle 
of duality for Boolean algebra). A corresponding principle of duality 
holds for the predicate logic, if we interchange 7 and F. For example, 
the theorem A, Px v V, — Px © T is dual to V, Px AA, 4 Px oF. 

Let us state a few easily proved theorems for Boolean lattices: 


aNy\a=a, ava=a, 

ang=0 avVvl=1, au0=a anl=a, 
aUb=b>anb=a, anb=b-avube=a, 
0=1, T=0. 


med 


(9.5) 


A domain with operations © and VU, for which only the axioms BO 
(without complementation), B1, B2, and B3 are required, is called a Jattice. 
Boolean lattices are distributive and complemented (cf. IB9, §1). 


9.3. Inclusion in Boolean Lattices 
Inclusion can be defined by 


(9.6) aChbsa=anb, 


which corresponds to the set-relation, or equivalently by (§7.2) bCa<-a= 
avub, 

Let aC b signify that aC b and a+ Bb. It is easy to show that the 
relation C is reflexive, transitive, and identitive, and is thus a partial 
ordering in the sense of < (cf. §8.3). Also, 


(9.7) at\bCa, aCavub, aCl, 0 Ca, 
(9.8) (aCbaalec)>albne, (6CancCa)>buUcCGa. 


From (9.8) it follows that bc and bUc may serve as greatest lower 
bound and least upper bound of 6 and c with respect to C. Every element 
that is contained in b and c is also contained in the greatest lower bound 
bc, and every element that contains 5 and c also contains the least upper 
bound 5 U c of b and c. The greatest lower bound of all the elements is 0, 
and their least upper bound is 1. Thus every lattice is partially ordered, 
with a least upper bound and a greatest lower bound for arbitrary a and b. 
Conversely, the above properties of inclusion may be used to construct 
a lattice from a partial ordering with least upper bound and greatest lower 
bound. For example, we may define x = aM b by 


(9.9) x=aNnboA ((zGaazC byez Ex). 


9 Boolean Algebra 69 


If we note that fora. b = c Ud we may also write V, (x=aNbax= 
c U d), we see that the axioms of 8 may be at once translated into axioms 
for C. For a = 0 we write A, (aC x); for a = 1 we write A, (x Ca); 
and for a = b we write A, (x CaUub)a Az(aNnbCx). 


9.4. Boolean Rings 


A third possibility for the description of Boolean algebra lies in the 
theory of rings. We define 


(9.10) a:b<sanb, a+b<=+(anb')uU(a' nb). 
Then we can easily show 


a:b=b-a, (a-b)-c=a-(b-0o), a+b=b+a, 
(9.11) a+(+c=(a+ b+, a:‘(b+c)=a-b+a-c, 
a-l=a, a+0O= a. 


(9.12) a:‘a=aA, a+ta=0. 


These are the axioms for a commutative idempotent ring with unity 
element. Such a ring is called a Boolean ring. Conversely, from a Boolean 
ring we can form a Boolean lattice by setting 


(9.13) anb<sa-b, aUbs=a+b+a-b, a<l-+a. 


9.5. Finite Boolean Lattices 


The subsets of a finite set Mf form a finite Boolean lattice with respect 
to the set-theoretic operations. Here the empty set represents the element 0 
and the whole set M represents the element 1. If M has n elements, then 
the lattice has 2” elements (cf. §7.2). Thus every finite Boolean lattice has 
2” elements (n = 0, 1, 2, ...), since we can show that every finite Boolean 
lattice is isomorphic to a lattice of subsets. The proof of this theorem 
rests on the fact that every element of a finite Boolean lattice is the union 
of atoms in the lattice, where an element a is called an atom if a ~0 
and if from x C a it follows that x = 0. The atoms of a lattice of subsets 
are the sets with one element {x} (see §7.2). The finite Boolean lattices 
can be very clearly illustrated by diagrams in which the elements are 
represented by points in a plane in such a way that if a@Cb and 
= V,(aCcacCb), then a lies below 6 and is joined to 5 by a line 
segment. Thus if the number of elements is 2°, 2}, 2?, 23, we obtain 
the following figures: 


** For the concept of rings see IB5, §1.5ff.; a ring is idempotent if a-a =a for 
each of its element. 
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{a,b,c} 


{b,c} 
| : 1 CARD _ 
Os 7 0 0 4) 

Fig. 6 Fig. 7 Fig. 8 Fig. 9 


The diagram for the lattice with 2? elements shows the subsets of the 
3-element set M = {a, b, c}. 


Exercises for §9 

1. Prove 
(a) (9.5) from the system of axioms 8, 
(b) (9.7) and (9.8) from % and (9.6), 
(c) (9.11) and (9.12) from ® and (9.10). 

2. Consider propositional forms constructed from countably many 
(cf. 7.3) propositional variables p, q, ... (cf. 2.4) by the connectives 
3, A, Vv, —, < (cf. 2.4) of the propositional calculus. Define 

H~@O<+ H«+ @ isa tautology (3.4). 
Now prove 
(a) ~ is an equivalence relation 


(b) ~ is consistent with the functions K, A, N defined on the set of 
propositional forms as follows: 


K(H, 0) = Hv®@ 
A(H, ®) = Hv @ 
N(H) = -H. 


(c) The equivalence classes form a Boolean algebra under the following 
definitions: 


(QQ) HN6 =HAO 
2) HU6=Hvo 


3) H = 3H 
(4)0=pa —p 


(5)1=pv-p 
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By b) the definitions (1)-(3) are independent of the representatives of 
the equivalence classes. Show that in (4) and (5) the definitions are 
independent of the choice of the propositional variable p. 


(d) If the number of propositional variables is finite, then the Boolean 
algebra is also finite. If m is the number of variables, then the 
number of elements in the Boolean algebra is 22”. 


Bibliography 


For Boolean algebra, see Goodstein [1]. 


10. Axiomatization of the Natural Numbers 


10.1. Preliminary Remarks 


The theory of natural numbers occupies an especially important place 
in studies in the foundations of mathematics. In the first place, the 
arithmetic of natural numbers offers a simple and important example of 
a theory with an infinite domain of individuals, in which the problems 
connected with the concept of infinity can be studied. Secondly, it has 
turned out that many other interesting metamathematical questions can 
be reduced to arithmetic (cf. the arithmetization in §5.4). Finally, the 
results of Godel on arithmetical algorithms have had a lasting influence 
on the whole program of metamathematics. Let us discuss these remarks 
in greater detail. 

The “leap to infinity” involved in recognizing the domain of the natural 
numbers is already adequate for all the ontological needs of the predicate 
logic (cf. §3); this is the meaning of the fundamental theorem of Lowenheim 
and Skolem, which essentially states that in order to investigate the 
concept of a consequence there is no need to use any domain of individuals 
other than the natural numbers. 

Since in a system of axioms © the means of expression (variables, 
logical symbols, and so forth), are obviously countable, it is clear that the 
obtainable expressions are also countable. Thus the expressions can be 
“numbered” constructively (see §5.4). For every expression the resulting 
index is computable and, conversely, for every number we can decide 
whether or not it is the index of an expression; if it is, then the expression 
can be recovered. As a result, certain metamathematical properties 
like ...is an expression, ...is the conjunction of... and... ... is true are 
transformed into number-theoretical properties. Thus all questions of 
decidability can be translated into the corresponding questions for 
arithmetic. Moreover, if the system G includes an arithmetical system of 
axioms, many of the metamathematical propositions about GS can be 
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formulated in © itself, and in this way it is possible to obtain extremely 
general theorems about mathematical systems of axioms (cf. §10.5). 

For a long time the concept of the (infinite!) totality of natural numbers 
was held to be intuitively clear, and indeed quite self-evident [cf. the similar 
situation for the concept of a set (§7.1)]. It was Frege (1884) who first 
pointed out the necessity for an exact definition of a natural number. 
In his attempt to reduce arithmetic to logic he defined the number 1, 
for example, as the totality of all one-place predicates that hold for 
exactly one individual. This definition is closely related to the set- 
theoretical introduction of the natural numbers and leads to the same kind 
of difficulties as the naive theory of sets (§7.3). Thus we naturally seek, 
as in that theory, to characterize the natural numbers by a system of 
axioms, The best-known system of axioms for the natural numbers is due 
to Dedekind (1888) but is named after Peano (1889). In §10.3 we shall 
discuss a somewhat modified system, formulated in the language of 
predicate logic. The question of axiomatizing the whole of arithmetic 
(§10.4) then leads us to the well-known Incompleteness Theorem of Gédel 
(§10.5). The present section closes with some remarks on the operational 
construction of arithmetic recently proposed by Lorenzen. 


10.2. The Peano Axioms 
The Peano axioms (with unimportant changes): 


(a) 0 is a natural number.** 
(b) Ifn is a natural number, then so is n'. 
(c) Ifm' =n’, thenm=n. 
(d) There is no number n for which n' = 0. 
(e) Axiom of complete induction: 
If a property P of the natural numbers satisfies the following two 
conditions, then P holds for every natural number: 
(1) Pholds for0.  » 
(2) For every natural number n, if P holds for n, then P holds for n’. 
These axioms can be stated in a formal language consisting, as before, of 
formulas or rows of symbols, but now, in view of the fact that the axiom (e) 
speaks of an arbitrary property, we must make use of a generalized 
predicate variable; that is, a predicate variable bound by the universal 
quantifier. Expressions with quantified predicate variables are regarded 


as belonging to logic of the second order, or to the extended predicate logic. 
Expressions in which only subject variables are quantified are said to 


33 The sequence of natural numbers is often taken to begin with 1. 
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belong to logic of the first order, or to elementary predicate logic. For the 
extended predicate logic, as well as for the elementary (§3), it is possible 
to give a semantic definition of the concept of a consequence. 

Except for axiom (e) we will continue to confine our arithmetical 
expressions to the elementary logic. Jn particular, in questions of com- 
pleteness and decidability we shall consider only relevant expressions of 
the first order. The fundamental concepts of our system of axioms are: 
(1) an individual variable for zero; as such we take the traditional symbol 0; 
(2) a predicate variable for the relation of successor; we make use of the 
functional notation and denote the successor of x by x’ (cf. §2.5); (3) a 
predicate variable for identity; as such we use the traditional symbol =. 
There is no need to mention the axioms (a) and (b), since we do not admit 
any individuals other than the natural numbers. In a supplementary axiom 
we express the conditions that must be satisfied by the identity. 

The Peano system of axioms § in the extended predicate logic.*4 


(PI) x= y'>x=y, 

(P2) “x =9Q, 

(Ind) A (PO a A (Py > Py’) > A Px), 
(G) x= yer \ (Px — Py). 


The semantic consistency (cf. §4.7) of % is obvious for anyone who feels 
convinced of the “‘existence”’ of the natural numbers. But for the extended 
predicate logic we have not yet defined a concept of deducibility, so that 
for the time being the question of syntactical consistency (cf. §5.7) does not 
arise. 

The system 8 is monomorphic (§4.6) and thus, as desired, it characterizes 
the natural numbers, Let us outline the proof. 

Let M and M be arbitrary models (cf. §3) of 9%. Then M contains a 
domain of individuals J, a function f (for x’) defined on J and a fixed 
element n (representing 0) in J. We denote the corresponding objects for 
M by J, f, #7. We must now show that M and M are isomorphic (§8.4); 
that is, we must demonstrate the existence of a mapping ® of J onto J 
with the properties of an isomorphism. 


(10.1) P(n) = i, 
(10.2) PD( f(x)) = f(P(x)). 


34 For clarity, we have emphasized here that P is generalized, i.e., bound by the 
universal quantifier. Of course, x and y are also to be considered as generalized. 
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First we define inductively a relation ® by 

(10.3) n@i, 

(10.4) AA (xETA yes) > (xPy > f(x) Of(y))-, 
(10.5) Let x®y hold only as required by (10.3) or (10.4). 


We now prove step by step [with tacit use of the axiom of equality (G)]. 


(1) The first domain of ® is J (proof by the axiom of induction for 
the model M). 

(2) The second domain of © is J (proof by the induction axiom for the 
model M). 


(3) There is no x in J with n = f(x) [proof by the axiom (P2) for M4]. 

(4) There is no x in J with f(x) ®a [proof by (3) and (10.3, 4, 5)). 

(5) If xa and y®A, then x = y [proof by (4) and (1)]. 

(6) If x®z and y@z, then x = y; that is, @ is a function (8.3) [proof by 
induction for M, (5) and (P1)]. 

(7) @ isa function [proof analogous to (6)]. 


Thus we have shown that © is a one-to-one mapping of J onto J, from 
which the properties of an isomorphism follow immediately by (10.3) 
and (10.4). 

We must note, however, that this proof can be attacked on the ground 
that it is based in an essential way on semantic ideas that are closely 
associated with the naive theory of sets. For in fact the “totality of all 
properties’’ referred to in (G) and (Ind) is uncountable. From the mono- 
morphy of § it follows that $8 is complete (cf. §4.5). 


10.3. The Peano Axiom with Restricted Axiom of Induction 


We now turn to an axiom system $8, , which completely avoids the 
extended predicate logic. In order to exclude quantification of predicate 
variables, we must first make some change in the axiom of equality (G). 
Let us replace it by the two axioms 


(G1) VS. 
(G2) x = y> (Hx) > HQ). 


Since for H(x) we may write any expression of the elementary predicate 
logic, it follows that, strictly speaking, (G2) is not an axiom but an axiom 
schema (§4) which in an obvious way represents countably many axioms. 
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The axioms (P1) and (P2) remain unchanged, but for (Ind) we must also 
introduce an axiom schema: 


(Ind,) H(O)a A (HC y) > H(y’)) ~ Hx) (induction schema). 


The system (G1, G2, Pl, P2, Ind,) will be denoted by $B, . Like , the 
system §8, is of course semantically consistent. On the other hand, 
monomorphy is lost in the transition from $B to 8, . For we see that the 
proof of monomorphy for P cannot simply be repeated for $B, , since the 
properties to which (Ind) was applied in that proof are not necessarily 
capable of formulation (and in fact cannot be formulated) in the elementary 
predicate logic (cf. §10.2). In §10.5 we shall see that P, actually admits 
nonisomorphic models. It can be shown that the set of deductions from 
$3, or from $ is decidable.** These systems are therefore complete and 
their theorems can be obtained by algorithms. 


10.4. Systems 3 and 3, for Arithmetic 

For the construction of arithmetic it is clear that the successor function 
alone is not enough. We also need addition and multiplication. These 
functions, as we know, can be defined recursively (§5.6), and the equations 
defining them can be adjoined to the axioms. Let us first state the axioms 
for addition: 


(10.6) x+0O0=-x, 
(10.7) x+n =(x+ny. 


From $ and $, we thus obtain axiom systems D and 2, , respectively, 
to which the properties of monomorphy and nonmonomorphy, of 
completeness and decidability, are transferred. But these advantages are 
offset by a certain poverty in our means of expression. To be sure, we can 
still express such number-theoretical concepts as x < y or 3 is a factor of x: 


(10.8) xc yo VZFOAxX+z= y), 
(10.9) 3|xo>VQZ@+z+2z= x). 


But it can be shown that other important concepts like x | y or x is a prime 
number cannot be defined, so that many interesting number-theoretical 
problems cannot be formulated and thus cannot be decided within the 
framework of these theories. 


35 It must be noted that in these formal systems multiplication does not occur and 
cannot be (explicitly) defined. 
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In order to enrich our means of expression we adjoin the recursive 
definition of multiplication to the axioms of D and 2D, : 


(10.10) x:-0=0, 
(10.11) x(n) =(x-n)+x. 


The resulting systems will be denoted by 3 and 3, . In these systems we 
can define, for example, the following arithmetical concepts: 


(10.12) x| yo V(y=x-2), 
(10.13) Primex<> x AO0Ax AO AAG|x>@=0' vz=x)). 


Gédel has shown, although we have no space for his proof here, that all 
decidable properties and relations (§5.4), e.g., z = xy, are now definable: 
The system 3 (or 3,) includes the complete recursive theory of numbers. 

The (syntactical) consistency of 3, was proved by Gentzen in 1936. 

In comparison with the preceding systems, the investigation of 3 and 3, 
gives rise to considerably greater difficulties. Consider, for example, the 
existence of such unsolved number-theoretical problems as the Goldbach 
conjecture: 


(10.14) AQ|zazA~2— V (Prime x Prime ya z= x + y). 


Such problems make it plausible, as is in fact the case, that in these 
systems the set of consequences is not decidable. The truth of this statement 
results from the following theorem of Godel, which is one of the most 
important discoveries in the whole theory of the foundations of mathe- 
matics. 


10.5. The Gédel Incompleteness Theorem: 3, Is Incomplete (Even 
Essentially Incomplete; cf. End of the Present Subsection) 

Although it will be impossible to include many of the details, we wish 
to give an outline here of the proof of this theorem, partly on account of 
its great importance, but also in order that the reader may see how an 
argument which in a natural language leads to a contradiction (namely 
to the Antinomy of the Liar described in §11.3) can in a formal language 
be put to good use, namely, to prove the incompleteness of 3, . 

An important instrument in the proof is the arithmetization described 
in §5.4, where we have shown that a procedure can be set up whereby 
the formulas of the language are characterized by their so-called Godel 
numbers. Since it is decidable whether or not a given formula is a relevant 
expression,** it is also decidable whether a given natural number is the 


38 A relevant expression here is the same as a relevant proposition in §4.5. 
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Gédel number of some relevant expression. Since we have shown in §10.4 
that all decidable properties can be defined in 3, , there exists a relevant 
expression A(x) which in the natural interpretation (i.e., the interpretation 
in which 0 corresponds to zero, and so forth) holds for a natural number 
if and only if this number is the Gédel number of an expression in 31- 

Finite sequences of relevant expressions can be represented by numbers 
in the same way as the expressions themselves, so that, in particular, 
proofs can be expressed by numbers, since they are merely special 
sequences of expressions. Since it is decidable whether a given rule of 
inference has been correctly used, we can now find a relevant expression 
C(p, q) which in the natural interpretation is true for p and q if and only if p 
is the number of a relevant expression H and gq is the number of a proof 
of Hin 3. 

We now proceed to construct a relevant expression E, containing no 
free variables, which in the natural interpretation states that E (in other 
words, the expression itself) is unprovable (cf. the Paradox of the Liar 
in §11.3). If we assume that E is provable, we then have the following 
situation: Every model of 3, , and consequently also the natural inter- 
pretation, satisfies E and therefore states, in contradiction to our 
assumption, that E is unprovable. On the other hand, if we assume that 
— Eis provable, the natural model will satisfy — £, and therefore falsify E; 
that is, E is provable, a result which, taken together with the provability 
of — E, contradicts the consistency of 3,. Thus neither E nor — E is 
provable. 

This syntactical result, when reformulated in semantic language, states 
that neither E nor — E is a consequence of 3, . In other words, 31 Is 
incomplete, as asserted. 

The expression E, which asserts its own unprovability, is constructed as 
follows: If n is the Gédel number of an expression with exactly one free 
variable x, let us denote this expression by 4,(x) and call n an A number. 
We construct the propositional form 


(10.15) x is an A number and y is the Godel number of a proof of A,(x). 


By means of the arithmetization, this propositional form can be 
represented by an expression B(x, y) in 3, with the two free variables 
x and y. Now let p be the Godel number of the expression A, — B(x, y). 
We form the expression A,(p) obtained by replacing x with p in A,(x). 
By (10.15) this expression states: for every y, the number y is not the Gédel 
number of a proof of A,(p). Thus A,(p) is a proposition E of the desired 
kind. 

This theorem can obviously be extended to all axiomatic theories that 
have constructive definitions for their expressions and rules of inference, 
and that include a sufficiently large part of arithmetic. 
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The incompleteness theorem has some remarkable consequences: 


(1) There exist arithmetical propositions (e.g., £) that are true for 
the natural numbers but are not provable in 3, . It is conceivable, for 
example, that the Fermat conjecture or the proposition (10.14) is true 
but cannot be deduced by means of the familiar rules of inference in 3, . 


(2) From the incompleteness of 3, it follows by §4.6 that 3, is not 
monomorphic. For example, the proposition £ is true for the model of 
the natural numbers but certainly untrue for some other model of 3,, 
since £ is not a consequence of 3, . 


(3) If we introduce into 3 certain natural rules of inference (it is to be 
noted that the language in which 3 is formulated goes beyond the means 
of expression available in the predicate logic), we can prove, just as for 3, , 
that there exists in 3 a proposition £ such that neither £ nor — E is 
deducible. Then we could proceed, again just as for 3, (see above), to 
prove that 3 is incomplete, provided we were allowed, as is the case in 3, , 
to replace the concept of provability by the concept of a consequence. 
But we know that 3 is complete, as may be proved in exactly the same 
way as for 8 in §10.2. Thus we have the important result that in 3, and 
more generally in the logics of higher order as contrasted with the predicate 
logic, the concept of a consequence cannot be reduced to an algorithm. 

One might think that the incompleteness of 3, could be removed by 
the introduction of further axioms that would leave the system consistent. 
But so long as we are dealing with finitely many axioms (or more generally 
with a decidable schema of axioms), the concept of provability remains 
decidable, so that the above argument can be applied to the enlarged 
system of axioms. Thus we are dealing here with an essential, nonremovable 
incompleteness. 

These results for 3, and 3, can also be obtained in the following way. 
We can show that in any sufficiently expressive arithmetical language 
there always exists, for any given recursively enumerable set (§5.3) M of 
arithmetical theorems [i.e., arithmetical propositions that are valid in the 
natural interpretation (§10.5)], an arithmetical proposition E which, 
together with its negation, does not belong to M. Thus we have: 


(a) Since the set of deductions in 3, is recursively enumerable (§6.2), 
the system 3, is incomplete; 

(6) The system 3, like $8, is monomorphic and therefore complete 
(§10.2). Thus the set of deductions in 3 is not recursively enumerable, 
and therefore certainly not decidable. 


For a system of axioms G that includes arithmetic we can also construct, 
by means of our arithmetization, a proposition W expressing the syn- 
tactical consistency of S. Then the Gédel theorem leads to the result that W 
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is not deducible in S, provided S is consistent. Consequently, in order to 
prove the consistency of G we must make use of methods that lie outside S. 


10.6. The Operational Construction of Arithmetic 

In this construction the theorems of arithmetic and of other branches 
of mathematics are regarded, without reference to any possible semantic 
interpretation, as statements concerning the application of certain rules 
of operation with finite systems, which may consist of numerals or of 
concrete objects of any kind. If we study these systems (which are made up 
of finitely many ‘‘atoms” or indivisible systems), we can distinguish them 
according to their “length,” and in this way we necessarily arrive at the 
conception of a number. By “‘abstraction’”’ from systems of the same 
length we obtain the fundamental numbers, which can be uniquely 
represented by systems such as |, ||, |||, ... (Lorenzen). Propositions, rules 
of inference, sets, and so forth are again merely systems or “‘terms’’ 
(possibly with certain rules of transition from one system to another). 
The fundamental rules of operation are given in the form of algorithms, 
on the basis of which further systems and rules can be ‘‘deduced.’”” How- 
ever, this ‘‘deducibility’” must be of an obviously ‘“‘constructive’”’ nature; 
in his “‘protologic,”” Lorenzen gives a number of principles of deduction 
that can be considered constructive. 

The operative construction of arithmetic can only be briefly indicated 
here (see also §5.2). The system for generating the numerals is defined 
by an algorithm with one axiom and one rule, involving the proper 
variable e (cf. §5.2): 


(10.16) l, 

= 

e|- 

Equality is defined by the following algorithm (k,/ are variables for 
numerals): 

(10.18) |= |, 

k=!1 

k\=1|° 


(10.17) 


(10.19) 


By various principles of deduction we now realize that: 

(10.20) k\|=I|>+k = 1, 

(10.21) k|F~k, 

(10.22) k=IAA(kK)>AD (so-called principle of equality). 


(10.23) A(|) A A (A(k) — Ak |)) + A() (so-called principle of 
induction). 
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The significance of (10.20) is that the rule 


kK|=1| 
k=I1 


is superfluous, i.e., in the algorithm for equality nothing can be deduced 
with this rule that cannot be deduced without it, as follows from the 
so-called principle of inversion: since k | = /| can be obtained only from 
k =I, it follows that kK = / must also be deducible. 

The atom « is introduced by a rule which is identical with the rule 
for A-introduction in §6.4, but the rules in §6.4 for the elimination of a 
are not required here, since the principle of inversion shows that they 
are superfluous. 

The systems (10.20)-(10.23) correspond to the Peano axioms; but in 
the present case they are not “‘postulated’’ but follow from certain 
“protological’’ theorems applied to the arithmetical algorithm. 

One advantage of this construction of mathematics lies in the fact that 
by its very nature it leads only to propositions that can be seen intuitively 
to be true and therefore cannot involve contradictions. 


(10.24) 


Exercises for §10 


1. On the basis of the axioms (P1), (P2), (Ind) and (G) prove the following 
theorem 
A (POA PO' a A (Px — Px") >A Px). 


2. To (P1), (P2), (Ind), (G), 10.6 and 10.7 
adjoin the axioms 
0? = 0 
(‘P=x+x4+x4+0". 
Then show that in the resulting system it is possible to define the relation 
that holds for x, y and z if and only if x - y = z. 


Bibliography 


Elementary problems in the foundations of arithmetic are discussed in Tarski 
{1]. For the theory of the systems Z and Z, see Russell [1]. On the concept 
of arithmetic itself see Frege [1]. 


11. Antinomies 


11.1. Classification of the Antinomies 

A proposition (or a propositional form) together with its negation form 
a contradiction. By an antinomy or paradox we mean an argument that 
leads to a contradiction. 
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It is natural to ask what could be the nature of such an argument. 
This question is most easily answered if we are dealing with an algorithm, 
since an antinomy then consists in the deduction of a proposition and of 
its negation. Since only formal processes are involved here, we speak of 
a Syntactic antinomy (for the concepts of syntax and semantics cf. §3.1). 

But it can also be the case that an argument which leads to a contra- 
diction is not truly formal but depends on the meaning of the propositions 
(or of parts of them) that are used in the argument. In this case we speak 
of a semantic antinomy. 

Since algorithms in the strict sense of the word are very recent inventions, 
it is not surprising that syntactic antinomies have been known for a 
relatively short time. On the other hand, many semantic antinomies were 
already discussed in antiquity. 

If we can deduce a proposition and its negation, then by the rule of 

—-elimination (see §6.4) we can deduce any proposition. But if we can 
deduce everything, there is no interest in constructing arguments. As a 
result, we reject any algorithm that leads to a syntactic antinomy. As for 
semantic arguments leading to a semantic contradiction, we must make 
up our minds to revise at least one detail of the intuitive truths “‘inserted”’ 
into the argument, but it is often very difficult to accomplish this change 
in a convincing way. 

Syntactic antinomies can also lead, at least indirectly, to a revision of 
our intuitive ideas. In general, an algorithm is not set up arbitrarily but 
is based on certain of our intuitive conceptions, which it presents in a 
concentrated form. Thus, if we find an antinomy in such an algorithm, 
we must realize either that the conceptions are not adequately represented 
in the algorithm, or else that they must be rejected, at least to some extent. 

We confine ourselves here to a detailed description of two antinomies: 
the Russell Antinomy, as a characteristic example of a syntactic antinomy, 
and the Antinomy of the Liar, as a characteristic example of a semantic 
antinomy. 


11.2. The Russell Antinomy 


We are dealing here with a system of axioms in the language of predicate 
logic, so that the deductions can be obtained by means of an algorithm. 
The intuitive conceptions at the basis of this system of axioms are of a set- 
theoretical nature (cf. §7). Let us describe them briefly: there exists a 
property defined by the predicate “‘x is an element of the set y.”” We 
represent this predicate by the symbol Exy (that is, we use the symbol 
Exy of predicate logic to mean x € y). Sets are represented by propositional 
forms with one variable; for example, the set of even numbers is repre- 
sented by the propositional form 


(11.1) 2| x, 
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and the set of prime numbers by the propositional form 
(11.2) x>1ad(y|x>y=lv y=x). 


Now if we assume, as seems natural, that every propositional form H 
with a variable x corresponds to a set y containing exactly those objects 
which satisfy H, we are led to require as part of our system of axioms that 


(11.3) V A (Exy — #H). 


It is to be noted that this requirement is not a single axiom but an axiom 
schema (cf. §4.1), since (11.3) is a prerequisite for every propositional 
form H containing x (but not y) as a free variable. 

The Russell Antinomy now consists of showing that this schema of 
axioms, within the framework of predicate calculus, leads to a contra- 
diction. 

The contradiction is obtained by taking for H the propositional form 
— Exx. Then the set y, whose existence is required by (11.3) (and whose 
uniqueness, unimportant here, follows from the principle of extension- 
ality), is the set consisting of every set that does not contain itself as an 
element. But this set y gives rise to a contradiction if we ask whether 
or not y is a member of itself. For if y is an element of itself, then y, 
precisely because it is an element of itself, cannot, by definition, be an 
element of itself. On the other hand, if y is not an element of itself, then, 
again by the definition of y, it must be an element of itself. Let us deduce 
the contradiction by a formal argument. In addition to the rules in §6, 
our set of axioms now includes all the special cases of (11.3) (see the 
following table). 


Line | Flagged\ Assump- 


Number | Variable | tions Assertion Rule Used 

1 Vy Az (Exy + — Exx) |} axiom 
2 y A, (Exy + — Exx)| V-elimination (1) 
3 Eyy «+ — Eyy | A-elimination (2) 
4 Eyy + — Eyy | «elimination (3) 
5 — Eyy > Eyy + -elimination (3) 
6 Eyy Eyy introduction of assumption 
7 Eyy — Eyy elimination of assumption (6) 
8 — Eyy — Eyy introduction of assumption 
9 — Eyy + — Eyy | elimination of assumption (8) 

10 Eyy v — Eyy | excluded middle 

11 Eyy v-elimination (5, 7, 10) 

12 — Eyy v-elimination (4, 9, 10) 

13 Ezz —-elimination (11, 12) 


14 — Ezz —-elimination (11, 12) 
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Lines 1-13 provide a finished proof for £zz, and lines 1-14 for — Ezz, 
so that the contradiction is proved.*’ 

This antinomy indicates that we must in some way revise the set- 
theoretical conceptions underlying the axioms (11.3). As a result, it is 
no longer assumed today that every propositional form defines a set 
(cf. §7.6). 


11.3. The Antinomy of the Liar 


This antinomy, already well known in antiquity, makes use of the 
concept of truth (cf. also §3) and is thus a semantic antinomy. We begin 
with the stipulation already stated in precise form by Aristotle, that a 
proposition is true if and only if it describes an actual state of affairs. 
As a concrete example, let us consider the proposition ‘“‘it is snowing.” 
Then we can say: 


(11.4) “‘it is snowing” is true if and only if it is snowing. 


But this proposition, consisting of the whole of line (11.4), remains true 
if we replace the proposition “it is snowing” by any other proposition. 
Thus we are led to recognize the validity of all propositions of the following 
form: 


(11.5)... is true if and only if--- 


9 


where in place of “--- 
that at the same time we put a name of this proposition in place of “.. 
In order to obtain the Antinomy of the Liar we consider the particular 
proposition: 


we may put an arbitrary proposition, provided 


99 
o 


(11.6) The proposition that follows “(11.6)’ is not true. 


In other words, the proposition asserts its own falsity. We now insert 
this proposition in (11.5) in place of “‘---” and at the same time we insert 
a name for this proposition in place of “...”. For such a name we choose: 
“the proposition that follows ‘(11.6)’.”’ Then as a special case of (11.5) 
we obtain: 


(11.7) The proposition that follows ‘“(11.6)’’ is true if and only if the 
proposition that follows “(11.6)” is not true. 


But from (11.7) it is easy to obtain a contradiction (cf. the Russell 
Antinomy starting from line 3 of the proof). 

This contradiction cannot be avoided as long as we agree to the following 
conditions: we accept the Aristotelian criterion of truth (11.5), we admit 


37 We could not stop with line 11 or 12, since they still contain a free occurrence of 
the flagged variable y. 
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that what is contained in the line (11.6) is a proposition and that ‘“‘the 
proposition that follows ‘(11.6)’” is a name for this proposition, and 
finally we accept the elementary logical deductions that lead from (11.7) 
to an actual contradiction. 

If now, faced with this contradiction, we ask at what stage we should 
change our point of view, it would be natural to look first at the Aristo- 
telian criterion of truth (11.5). Yet it must be admitted that propo- 
sition (11.5) seems almost self-evident and that we would never have felt 
any doubt about it if the antinomy had not been brought to our attention. 
Moreover, we must take note of the fact that in a certain respect we have 
already made use of this criterion in §3.4, where we discussed the validity, 
in a Certain interpretation, of an elementary propositional form Px, , 
..., X, . For we can express the Aristotelian criterion, as applied to that 
special case, in the form: 


(11.8) If we replace x by 3 and P by the property of being a prime 
number, then Px is true if and only if 3 has the property of being 
a prime number. 


The similarity with (11.5) is unmistakable. 

But this comparison indicates how we can attack the Antinomy of the 
Liar. In (11.8) the problem at issue is to define what is meant by saying 
that a given propositional form is true in a given interpretation. Now the 
propositional form Px belongs to the language of predicate logic but the 
desired definition will be given, not in the language of predicate logic, but 
in some other language, namely whatever language we use for talking about 
predicate logic. Our choice for such a language is everyday English, 
cautiously used in a somewhat refined form. The predicate “is true’’ 
introduced in (11.8) belongs to this everyday language but refers not to 
propositions of everyday language, but to propositional forms in the 
language of predicate logic (in conjunction with the given interpretations). 

Thus the difference between (11.5) and (11.8) is essentially as follows: 
in (11.8) we are dealing not only with a given language (the language of 
predicate logic) but also with a metalanguage (everyday English), in 
which we speak about the first language. The predicate “‘is true”’ in (11.8) 
is a predicate in the metalanguage. But it refers not to propositions in 
the metalanguage, but to expressions in the first language. In (11.5), 
on the other hand, there is only one language, namely everyday English. 
The predicate ‘‘is true’ occurring there belongs to this everyday language 
and also refers to propositions in the same language. 

Now it is easy to see that in (11.8) no antinomy is to be feared (or at any 
rate we cannot so easily construct one as in the Antinomy of the Liar). 
For the Antinomy of the Liar is based on a proposition that states its 
own falsity. But such a situation is not possible (or at any rate not 
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immediately possible) if we distinguish between language and metalan- 
guage. For in that case we cannot form a proposition that states its own 
falsity. Such a proposition, call it «, must belong to the metalanguage, since 
it contains the word “true” (or “false’); but the word “true” in the 
metalanguage refers to propositions of the initial language and therefore 
cannot refer to «. 

In summary, we may say: we can escape from the Antinomy of the 
Liar by distinguishing between language and metalanguage and by 
speaking about the truth of the propositions in a given language—not 
in that language itself but in a metalanguage. Such distinctions between 
a formal language and a metalanguage, or a meta-metalanguage and so 
forth, are common in modern logical investigations. Since the natural 
languages of the world are “universal” and fail to make this distinction, 
in the sense that they use the word “true” for arbitrary propositions 
expressible in them, many investigators consider these natural languages 
to be inevitably self-contradictory. 

As a final remark, let us point out that the other semantic antinomies 
can be avoided when we make the distinction between language and 
metalanguage. Consider, for example, the antinomy of the smallest 
natural number that cannot be described in English in fewer than a 
hundred words. The antinomy arises from the fact that, precisely in the 
definition just given, this number has nevertheless been described in 
fewer than a hundred words. But the above definition refers to all possible 
descriptions and thus, since it speaks of these descriptions, it must belong 
to a language that is a metalanguage with respect to the language to 
which the descriptions belong. Consequently, we obtain in the metalan- 
guage a description for the number which is shorter than any possible 
description in the initial language. But this result is not a contradiction. 


Exercises for §11 


I. An adjective A is said to be autologic if A has the property described 
by A, and otherwise A is heterologic. Examples of autologic adjectives 
are: “‘seventeenlettered,”’ ‘‘English,” ‘‘pentasyllabic.”’ Consider the 
word “heterologic.” Is it heterologic or autologic? Explain and 
resolve the antinomy (Grelling’s antinomy). 


2. If the definition of an object or element m depends on a set M and if 
m is then assigned to M as an element, the definition of m is said to 
be impredicative. 


(a) Show that the antinomies mentioned in the text make use of 
impredicative definitions. 


(b) Show that the definition of the least upper bound of a set M of 
real numbers, as given in real analysis, is impredicative. 
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PART B 


ARITHMETIC AND ALGEBRA 


Introduction 


The development of modern algebra since the beginning of the present 
century has been a process of continually increasing abstraction, so that 
the subject was often called abstract algebra. \t was realized that important 
simplifications could be gained, both in concept and in method, if for 
the various fields of arithmetic, the theory of numbers, algebraic equations, 
functions of a complex variable, and so forth we establish as clearly as 
possible what is common to these subjects and then present it in a form 
that is valid for all of them. For it often happens that theorems that have 
been discovered and proved in widely different fields of mathematics 
are found to be identical from the logical point of view, so that the proof 
can be carried out quite independently of the various interpretations in 
one field or another. In fact, the proof is generally much simpler and 
clearer when these particular interpretations are set aside; moreover, 
we can spare ourselves the trouble of proving the same theorem over and 
over again, since the general “‘abstract”’ proof is valid for all the ‘‘concrete”’ 
cases. 

Since mathematics is in itself a very abstract science, the reader may feel 
surprised that certain branches of it are described as ‘‘abstract.’’ Let us 
examine the situation. 

The concept of a natural number is already the result of a complicated 
process of abstraction by no means easy to retrace (cf. IA, §10.1, and 
IB1, §1.1), and we are scarcely conscious of it in everyday calculations. 
But the immense intellectual effort involved in first making this abstraction 
has been richly rewarded by our being able to apply the simple rules of 
arithmetic to problems dealing with any kind of objects—stones, trees, 
lengths, weights, and so forth. 
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The same remark can be made about geometry. The concept of a 
triangle, which underlies the theorems of geometry, is already extremely 
abstract; it means only that we are dealing with a figure consisting of 
three points and the lines that join them. But then the theorems we deduce, 
e.g., that the sum of the angles of a triangle is equal to two right angles 
or that the sum of two sides is greater! than the third side, are valid for 
all possible triangles, regardless of their size, shape, or origin. The task 
of setting up the abstract geometric concept of a triangle demands a 
massive intellectual effort, but this effort is far more than offset by the 
simplicity and generality of the resulting theorems, which can now be 
applied to all possible triangles. 

Now it is reasonable to expect a similar advantage from what we may 
call a second stage of abstraction, namely, from the fact that certain 
concepts and methods in the various branches of present-day mathematics 
can be identified with one another if we make an abstraction from their 
interpretations in various special fields. 

The objects of study in abstract algebra are sets of an extremely general 
nature; their elements may be numbers, polynomials, functions, vectors, 
transformations, or any conceivable entities, whose meaning in any 
particular branch of mathematics is quite irrelevant. These sets have an 
algebraic structure consisting in certain relations or laws of combination 
among the elements within the set, in the existence of certain distinguished 
subsets, and so forth (see also IB10). Examples are groups (IB2) together 
with their subgroups, modules (IB1, §2.3), and Jattices (IB9); Chapter IB5 
will deal with general (commutative) rings and integral domains, whose 
structure is characterized by the presence of certain distinguished subrings, 
namely, the ideals. Special rings also occur in other chapters; for example, 
the ring or integral domain of rational integers (IB1, IB6), the field of 
rational numbers (1B1, 1B6), rings of algebraic numbers and the field of 
algebraic numbers (1B6, IB7), rings of polynomials (IB4), rings of matrices 
(1B3), rings of groups (IB2), rings of endomorphisms (1B1,2.4), and so forth. 


1 Or at most equal, if we allow the three vertices to lie on one line. 


CHAPTER 1 


Construction of the System of Real Numbers 


1. The Natural Numbers 


1.1. The Peano System of Axioms 


The simplest approach to the natural numbers (in the present section 
they are simply called numbers) is provided by the common practice of 
counting objects by making marks on paper, so that the number of 
objects is represented by a row of marks, for example ||||. This procedure 
suggests that we define the natural numbers as the diagrams obtained by 
writing vertical strokes one after the other. The number | is also written 
in the form | and is called ‘‘one.” The number formed by writing a 
vertical stroke to the right of the number a is called the successor of a; 
in the present §1 (but only here) we write this number! in the form a’. 
Equality of two numbers is defined as follows: beginning from the right- 
hand end (many other procedures would also be possible), we attempt 
to make a one-to-one correspondence between the two sets of strokes. 
If such a correspondence can be set up (as in the diagram) we say that 
the numbers are equal; otherwise they are unequal. 

We see at once that the logical requirements for 

a definition of equality (see §2.2) are satisfied 

| rit Ith here, that a’ ~ 1 for every natural number a, 
and finally that a’ = b’ is equivalent to a = b. 

Every number can be formed from the number | 

by repeated construction of a successor. Conse- 

quently, any property that belongs to the number | and is hereditary, 
i.e., is bequeathed by each number to its successor, belongs to every 


1 The symbol a | would be quite adequate but we do not adopt it here, partly for 
typographical reasons and partly because we want to keep our notation independent 
of any particular method of introducing the natural numbers. 


93 


94 PART B ARITHMETIC AND ALGEBRA 


number. Let us summarize this information in the following system of 
axioms: 


I. 1 is anumber. 


II. Zo every number a there corresponds a unique number a’, called its 
successor. 


Ill. Ifa’ =)’, thena = b. 
IV. a £1 for every number a. 


V. Let A(x) be a proposition containing? the variable x. If A(\) holds 
and if A(n’) follows from A(n) for every number n, then A(x) holds 
for every number x. 


From this system of axioms (which is usually named after Peano; 
cf. IA, §10) we shall see that by logical reasoning we can derive any 
theorem about the natural numbers without further reference to the way 
in which they were introduced. Thus a reader who for any reason is 
dissatisfied with our definition of natural numbers may adopt any other 
definition that leads again to I-V, and then he can follow our further 
developments. Our reason for setting up a system of axioms is not that 
there is anything inexact about the procedure? using vertical strokes; the 
system of axioms simply sets us free from this particular procedure. For 
example, we could define cardinal numbers as classes of equivalent sets 
(see IA, §7.3) whereupon‘ we would quickly arrive at I-IV; then axiom V 
serves to distinguish the natural numbers among all the cardinal numbers: a 
cardinal number is a natural number if and only if it possesses every 
hereditary property that belongs to the number |. 

Axiom V is called the axiom of induction, or the principle of complete 
(or mathematical) induction (on n) or also the argument from n ton + 1. 
The ‘“‘complete” induction of mathematics is thus in sharp contrast with 
the “incomplete” induction of the experimental sciences, where a general 
law is derived from (finitely many) individual cases. This unfortunate 
choice of name must not be allowed to obscure the fact that in complete 
induction we are dealing with a deductive principle and not with the 
verification of a proposition A(x) for a finite number of x values; for in 
fact, in applying the principle, we are required to show that for an arbitrary 


2 Thus A(7) is the proposition that is formed when x is replaced by x. Strictly speaking, 
A(x) is a propositional form (see IA, §2.3). 

8 Any apparent inexactness is due to the brevity of these introductory remarks. 
A complete description of the operational method of introducing numbers can be 
found in P. Lorenzen [1]. See also IA, §10.6. 

4 The number 1 is now the class of those sets that contain only one element; and if 
a is the class of sets that are equivalent to a given set M, its successor a’ is the class of 
sets equivalent to M’, where M’ is formed from M by adjoining an element not yet in M. 
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nthe proposition A(n’) always follows from A(m), and such a proof can 
only depend on some general procedure, not on any special knowledge 
for a given number n. We refer to A(1) as the initial case, to the argument 
from A(n) to A(n’) as the induction step, and to A(m) as the induction 
hypothesis. \f N is the set of natural numbers, axiom V can also be 
expressed as a proposition about an arbitrary set M: 

If 1eM and if n'é€ M follows from née M for every natural number n, 
then N © M;° for we may write any proposition A(x) in the form x € M, 
where M is the set of those elements x that have the property A(.x). 

It should also be mentioned that the choice of axioms is to a great 
extent arbitrary; it is only necessary that they imply exactly the same 
consequences as can be deduced from I-V; that is, they must imply the 
axioms I-V and be implied by them. Instead of ‘‘one’’ and ‘‘successor’’ 
we may introduce other fundamental concepts, e.g., the ordering defined 
later in §1.4 (the relation of “‘smaller than’’).5? 


In the lower grades at school the natural numbers occur in the form of 
cardinal numbers; in other words, the number 3 is introduced by abstraction 
from sets of three objects (persons, marks, points or the like). The essential 
identity of (finite) cardinal and ordinal numbers is brought out by arranging 
objects in rows. Addition arises as the mathematical expression for putting 
sets of objects together (forming their union) or by extending the rows of 
objects (this process is recognizable in the recursive definition of addition 
in §1.3). The other rules for calculating with natural numbers are based on 
addition. Further work with natural numbers depends on the familiar rules 
of calculation (commutative and associative laws of addition and multiplication, 
distributive law, monotone laws), which in the following pages are derived 
from axioms I-V but in early school years are learned by experience without 
any explicit formulation. Thus, in early instruction these rules play the role 
of axioms; much later, in the more advanced grades, they are supplemented 
by the principle of complete induction. 


1.2. Recursive Definitions 

In order that the sum of the numbers of elements in two disjoint finite 
sets (for these concepts see §1.5) may be equal to the number of elements 
in the union of the two sets, the following equations must obviously be 
satisfied: 


(1) a+l=d’, 
(2) gb = @2 By 


5 The notation MC M’ means that xe M’ follows from x¢M but M <4 M’. 
In this case M is called a proper subset of M’, but in the case M C M’ (that is, if equality 
is also possible) M is simply called a subset of M’ (cf. IA, §7.2). 

5a See, e.g., Feig] and Rohrbach [1]. 
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So we have the task of introducing for every ae N a function f which 
is defined in N and has the properties 


(3) {MW=a, f([&)=fay forall xeEN; 


for then we can simply define a + b as f(b). The fact that for every number 
a there exists exactly one function f with the properties (3) is a special 
case of the following general theorem: 


Let c be a number and let F be a function of two arguments defined in N 
and with values in N. Then there exists exactly one function f defined in N 
such that 


(4) IYD=c, Sfx) =FOS(%)) forall xen. 


It is clear that (3) is obtained from (4) by setting c = a’ and F(x, y)=y’. 
The definition of a function f by the conditions (4), which is possible in 
view of the general theorem, is called a recursive definition, since the 
determination of f(x’) is reduced to that of f(x) and thereby finally to that 
of f(1). 

To prove this theorem, which is also called the principle of recursion,’ 
we first replace the concept of the function f by that of the set of pairs 
(x, y) with y = f(x).8 Then (4) requires the construction of a set P of 
pairs (x, y) with the properties 


(5) (,ceP; from (x, y)EP follows (x’, F(x, y)) € P. 


Here it will be prudent to take the smallest such set P, namely the set that 
is formed from the pair (1, c) by repeated application of the step from 
(x, y) to (x’, F(x, y)).° In order to define a function f by means of this set, 
we must prove that for every number xe WN there exists exactly one 
number y with (x, y)¢ P. But by complete induction we see from (5) that 
such a y exists and is unique. For if we use (5) to construct the elements 
of P, we obtain, apart from the pair (1, c), only pairs of the form (x’, z), 
and thus, since x’ 4 1, it follows from (I, y)« P that y = c. If we now 
assume the desired assertion for x, and if (x’, z,), (x’,2z,)¢P, then 
Z, , Z, must be of the form F(x, y,), F(x, ye), with (x, y), (x, ye) € P, since 


7 The same name is given to certain generalizations of Eq. (4), one of which is con- 
sidered on p. 97. 

8 In IA, §8.4, the functions were directly defined as such sets of pairs. But the concept 
of a function can also be defined in other ways, independently of the concept of a relation 
(see, e.g., Lorenzen [1)). 

* If we wish to proceed here on the basis of set theory, which is not altogether 
necessary, we will define P as the intersection of all sets P satisfying (5) and must then 
show, for example, that: if we had (x’, z)¢ P, z # F(x, y) for all y, then the deletion 
of (x, z) from P would produce a set satisfying (5), in contradiction to the definition of P. 
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otherwise the pairs could not be constructed; then y, = y, by the induction 
hypothesis and therefore z, = F(x, 1) = F(x, y2) = z.. Thus there 
exists a function / satisfying (4). 

In order to prove that this function fis unique, we now assume that g 
is a function satisfying (4), so that g(1) = c, g(x’) = F(x, g(x)). Then 
we have g(1) = f(1), and under the hypothesis that g(x) = f(x) we also 
have g(x’) = F(x, g(x)) = F(x, f(x)) = f(x’). Thus by complete induction 
g(x) = f(x) for all x € N, so that g =f. 


From the proof we see that the theorem is valid under the following weaker 
hypothesis: the values for the second argument of F need not be numbers 
but may form an arbitrary set, quite independent of the values for the first 
argument: this set contains c and the values of F. Of course, we then obtain a 
function f whose values are no longer necessarily numbers but belong to the 
arbitrary set. 

Our principle of recursion can be made more general if we replace (4) by 


(4’) fY=c Sf’) = FAS (),.5f00)) for all x EN, 


where F, is a function of x arguments for every natural number x.! But this 
more general principle can be reduced to (4) by a simple transformation: 
namely, with the number x we associate the x-tuple™ (f(1), ..., f(x)) and denote 
this mapping by f*, so that 


S*(x) = (FD, «+ £0). 


It is clear that the function f is uniquely determined by f*. Consequently, 
in order to show the existence and uniqueness of a function f satisfying (4’) 
we need only transform (4’) into conditions on f* that are of the form (4) and 
are therefore satisfied by the mapping f*. For this purpose, in (4) we replace f 
by f* and c by the I-tuple (c) and define the function F as follows: 


F(x, y) = (21, 065 Zn 5 Fal2y , --+5 2Zn)), for y = (2), ., Zn). 
Then 
F(x, f*(x)) = [f0), +5 fC), ACS), -- fO0)], 


so that after these changes the conditions in (4) become identical with those 
of (4’), in view of the fact that 


FAX’) = (FD), os £00) SOX). 


But now we must have recourse to the above-mentioned possibility of weakening 
the hypotheses in our original principle of recursion: the arbitrary set in question 
now consists of all n-tuples (z,,...,Z,), where m is any natural number and 
the z,,..., Z, are no longer required to be numbers but only members of a set 
containing the arguments and the values of the functions F, . 


1° For the concept of the number of elements of a set, see §1.5; in the formulation 
of (4’) we naturally require the concept of a segment as defined in §1.5. 

11 An x-tuple is a mapping of the segment A, (see §1.5); thus the x-tuple in question 
is obtained by restricting the domain of f to A, . 
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1.3. Addition 

By the principle of recursion and the remarks at the beginning of 
§1.2, there exists exactly one operation, to be denoted by + and called 
addition, which is a function of two arguments, with arguments and 
values in N such that (1), (2) are satisfied for all a, b € N. In other words, 
(1), (2) constitute the recursive definition of addition.’? Addition is 
associative: 


(6) (a+b)+c=a+(b+ 0c). 


The proof is based on the argument from c to c’: (a+5)+1 = 
(a+b) =a+ (6+ 1); if (©, then(@+b)+c’ =(@+5)+ 0c) = 
(a+(6+4+ 0c)! =a+(6+c) =a+(6+4+c’). Addition is also com- 
mutative: 


(7) at+b=b+a. 


For b = | the proof is by the argument from a to a’: 1 +1=1+41; 
if 1+a=a+l, then lt+a=(1+ay=(@4+1)/=(@41) = 
(a+1)+1=a'+1. The proof of the general assertion is by the 
argument from b to b’ by means of (6) under the induction hypothesis 
of (7): 


a+b =(a+ by =(6+ay =b+a =6b+4+(a+1) 
=b+Il+a=6+)+a=0' +a. 


By (6) we may therefore omit the parentheses in a sum with three 
terms. In order to be able to omit them in sums with more than three 
n 


terms, we first define the expression }°7_, a; for a given sequence?® 
(a;);-1,2,,.. of numbers a; recursively by setting 


1 n+l1 n 
(8) Ya=a, Ya, = VY apt an; 
i=l i=l inl 


for this purpose we need only set c = a, and F(x, y) = y+ az,, in (4). 
In particular, we have }¥_, a; = (a, + a) +a, =a, +a,+ a, and 
Yi @ = (a, + a, + a3) + a,, for which again we naturally write 


12 The recursive definition of addition, in particular (1) and (2), is suitable for in- 
struction at the end of the secondary school, where it could be presented in a course 
on the axiomatization of the natural numbers. In such a course the proofs given in 
the present section would be appropriate examples. 

18 A sequence of this sort (infinite) is simply a function i — a, , defined on N, whose 
values in this case are also in N. 
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a, + a, + a, + a,. Moreover, no parentheses are needed to express 
addition of such sums, as is shown by the equation 

n m n+m 
(9) Yat Vasu = Y a. 
i=l 


i=l t=1 


The proof of this equation is by the argument from m to m’: for m = 1, 
Eq. (9) becomes the second of the equations in (8); and from (9) we see 
by (8) and (6) that 


3 a; + 3 Anyi = y a, + ( i a Bia’) 


n m 
=a ( y a; ab y ans.) =F Qni-m’ 
i=l a 


n+m (n+m)’ n+m’ 
= y a; + A(ntm)’ = > a; = > a; . 
i=l i=] i=] 


We note that none of the properties of the numbers (except where they are 
used as indices) is needed here except property (6). The extension of (7) 
to sums of more than two terms will be proved in §1.5. 

We now prove by means of (9) that any meaningful expression A 
constructed from numbers a,, ..., a, (in this order), and from + signs 
and parentheses has a value, namely =) "f_, a; , which is independent of the 
distribution of the parentheses (for k = 1 the expression A is to be taken 
equal to a,). For the proof we make use of induction on k in the altered 
form of §1.414 (with k instead of m and with M as the set of numbers k 
for which the assertion is true). By the construction of A there must exist 
natural numbers n, m with n + m = k (k 4 1) such that for expressions 
B, C formed from a,, ..., d@, and @y41, ..-) Qn4m under appropriate distribu- 
tion of parentheses, we have the equation A = B + C. Since n,m < k, 
the induction hypothesis means that B= >,a,, C = Yj'1a,.;, So 
that the desired assertion A = > 1, a;, follows from (9). For k = 1, the 
assertion is immediately obvious. 

If alla; = a, the sum $°7_, a; = 37, ais called the nth multiple na of a. 
For this mu/tiplication, (9) gives at once the distributive law 


(10) na -+ ma = (n + mja. 
The commutative law for multiplication is dealt with in §1.5, and the 


4 This anticipation of theorems on order relations (which we have already used in 
speaking of ‘“‘the numbers a, , ..., a,’’) is permissible here, since the present result is 
not used in §1.4. 
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associative law in §2.4. From (8), by the argument from n to n+ | it 
follows directly that nl = n. 
Finally, by the argument from c to c’ we prove the rule: 


(11) a=), if atc=b+e. 


For by (1) the case c = | is already dealt with by axiom III; and from 
a+c’=b+c' it follows from (2) that (a+ c)’ = (6+ cc)’ and 
therefore a+ c=6+c¢; thus, by the induction hypothesis we have 
a = b, as desired. 


1.4. Order 


If for the numbers a, b there exists a number c with a + c = 5b, we 
write a < b (a isl ess than 5), or alternatively b > a (6 is greater than a).15 
For the relation < defined in this way we have the following theorems: 


(12) ifa<b, thnasb (antireflexivity); 

(13) ifa<bandb<c,thena<c (transitivity); 

(14) ifa <b, then(a+ da) <(b+d) (monotonicity of addition); 
(15) ifaf~b, thena<borb<a. 


Rule (12), which states that a + c a for all a,c, is proved by complete 
induction on a, for we have 1 + c 4 1 by (1), (7) and axiom IV; and 
if we had a’ +c =a’, it would follow that (a +c)’ = a’, and thus 
a +c =a. For the proof of (13), (14) we seta +u=6,b+v=c and 
thus get c=(a+u)+v=a+(u+v), b+d=(at+ut+d= 
a+(u+d)=a-+ (d+ 4) = (a+ d)+u. Complete induction on a 
is again used to prove (15), as follows. The case a = 1 is first dealt with 
by complete induction’* on 6: 1 = 1;1 <<1+b=6+1=b5’. Thenfrom 
(15) (for all b) the same statement with a’ instead of a (thus for all 5: 
a’ <b or a =65 or b <a’) is derived by complete induction on b: 
l<a’;a’ <b’ora’ =D’ or b’ <a’ by (1S) and (14); here again the 
induction hypothesis (a’ < b ora’ = b or b < a’) is not used. 

From (12), (13) it is easy to see that no two of the statements 
a<b,a=b,b <acan be valid at the same time; thus in (15) we can 
insert the exclusive ‘‘either.”” With < as an abbreviation for ““< or =” 
it follows that a < b is the negation of b <a. From | <a’ we see, by 
complete induction’® on a, that 


(16) ] <a, 


15 Note that by numbers we here mean the natural numbers 1, 2, 3, ..., not including 
zero. 

16 In this case the induction hypothesis is not used at all, a fact which may make the 
proof somewhat harder to follow. 
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and thus a < | is impossible for any number a. Also 
(17) a<b+1 if and only if a <b. 


For froma < b ora = bit follows by (13) thata < b + 1. 

On the other hand, if a + c = 6 + 1, it follows from (16) that we need 
consider only the cases c = | and c > 1 (that is, c = u + 1 fora certain 
u). In the first case we have a = b by (11), and in the second (a + u)’ = b’ 
and thusa+u=b,ora <b. 

From the principle of induction we can now derive the following 
modified principle of induction: If the number m is contained in the number 
set M whenever ne M for all numbers n < m, then M = N. The induction 
hypothesis now reads: “ne M, for all numbers n < m”; and there is no 
special initial case. For the proof we consider the set M* of numbers m 
with ne M for all n <m. Then the hypothesis of our new principle 
simply states that M* C M. Since n < 1 is not valid for any number n, 
we get | ¢ M*. By (17), any number n < m’ is <m or =m. If we now 
assume that m € M*, then n lies in M not only in the first case but also, 
since M* & M, in the second case as well. Thus we have derived m’ « M* 
from me M*, so that by the argument from m to m’ we have M* = N 
and thus also M = N. 

With the new principle of induction, it is very easy to prove the theorem 
of well-ordering of the natural numbers: every non-empty set of natural 
numbers contains a smallest number. For the proof we reformulate the 
assertion thus: if the set of numbers M contains the number n, then M@ 
contains a smallest number. If this statement is assumed for all numbers 
n<m and if me M, then M contains a smallest number provided it 
contains a number <m. But otherwise m is itself the smallest number in 
M. As another method of proving the same theorem, we note that, 
if we replace M by the set N—M of the numbers not in M, our modified 
principle of induction can be transformed, by contraposition (see IA, §6.6) 
and other purely logical operations, into the desired theorem of well- 
ordering of the natural numbers. 

The principle of induction can also be generalized to complete induction 
Starting from k, as follows: 


If the set M contains the number k and if n' € M for every number n > k 
such that n € M, then M contains all the natural numbers >k. 


For the proof we may assume k > 1. We set kK = A + 1 and consider 
the mapping x — x + h, which maps the set N into the set of natural 
numbers >A and thus >k. The inverse mapping [which exists on account 
of (11)] takes M into a set for which we may prove, by the ordinary 
principle of induction, that it contains the set N. Thus M does in fact 
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contain all numbers >k. After we have introduced the integers (see the 
next section), this proof obviously holds for any arbitrary integer k. 


1.5. Segments 

The set of numbers <n is called the segment A, . By (12) and (13) 
we see that 4,, C A, means the same as m <n. A set M (whose elements 
are not necessarily numbers) is said to be finite if it can be mapped 
one-to-one onto a segment A, ; that is, if there exists a one-to-one 
(invertible)!” mapping f of M onto!® A, . The number n is then uniquely 
determined and may therefore be called the number of elements in M. 
In order to prove the uniqueness of 1, we consider a one-to-one mapping 
fof M onto A, and also a one-to-one mapping g of M onto A,,. By (15) 
there is no loss of generality in assuming m <n. If we carry out the 
inversion of f and then the mapping g, we obviously obtain a one-to-one 
mapping of A, onto the subset A,,. Our assertion then follows from the 
theorem: 

A one-to-one mapping f of A,, into itself)® is a mapping onto A,, . 

We prove this theorem by the argument from n to n’. For n = 1 the 
assertion is clear, since | is the only element of A, . Now let f be a one-to- 
one mapping of A, into itself. If n’ 4 f(x) for all x <n, then f induces 
to a one-to-one mapping of A, into itself, so that by the induction 
hypothesis f(A,) = A,. But then we can only have f(n’) =n’, and 
consequently /(A,’) = A,’ . But if n’ = f(k) for a number k < n, then 
by setting 


(f(x) for k~Ax<n 
g(x) = ft’) for k=x ; 


we define a mapping g of A, into itself, since f(n') 4 n' = f(k) follows 
from n’ ~ k. The one-to-one character of g follows easily from that of f, 
so that by the induction hypothesis we have g(A,) = A,, and conse- 
quently f(A,’) = A,’. 

If a set M with n elements is mapped onto a set M’ with m elements, 
then m < a, as is easily shown by complete induction on vn. But if m < n, 
the mapping cannot be one-to-one; for a one-to-one mapping of M onto 
M’ followed by a one-to-one mapping of M’ onto A,, would show that m 
is the number of elements of M. The application of this fact is often called 
the Dirichlet pigeonhole principle: the ‘‘pigeonholes’’ are the elements of 
M’, into which the “‘objects” (namely the elements of MM) are “inserted” 


17 “One-to-one” or “invertible” means: if f(x) = f(x*), then x = x*. (Cf. TA, §8.4.) 
18 “Onto” means: for y € A,, there exists an x € M with y = f(x). (Cf. TA, §8.4.) 
19 That is, the set f(4,) of the images f(x) (x € A,) is a subset of 4, . (Cf. IA, §8.4.) 
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by the mapping; if there are more objects than pigeonholes, then at least 
one pigeonhole must contain two different objects. 

In general, two sets are said to be equivalent (cf. [A, §7.3) if either of them 
can be mapped one-to-one onto the other. Thus the finite sets are defined 
as those sets that are equivalent to the segments 4, . For convenience, 
the empty set 9, which contains no element at all, is also said to be a 
finite set. In an infinite set M, namely a set which is not finite, it is easy to 
determine a subset which is equivalent to the set N of all natural numbers: 
for if f is a mapping which to each non-empty subset Y of the set M 
assigns”° an element of f() of the subset ¥, the sets M,, Mo, ... 
can be defined recursively by M, = {f(M)}, M, = M, U{f(M — M,)}, 
and then the union of the M,, provides us with the desired subset N*. 
Thus, since x +x -+ 1 is a one-to-one mapping of N onto a proper 
subset of N, there also exists a one-to-one mapping g of N*(CM) onto 
a proper subset of N*. If each element of M — N* is assigned to itself, 
the mapping g is thereby extended to a one-to-one mapping of M onto 
a proper subset of M. In view of the preceding theorem and the fact 
that 0 has no proper subset, we have the result:2 

A set M is finite if and only if there exists no proper subset of 
M equivalent to M. 

As a counterpart to the above theorem on the mappings of A, we 
prove: 

From {(A,) = A, it follows that f is one-to-one. For x < n we determine 
the smallest number y with {(y) = x and denote by g the mapping of A,, 
into itself thus defined, so that we have y = g(x). From g(x) = g(x*) it 
follows that x = /(g(x)) = f(g(x*)) = x*, so that g is one-to-one and 
therefore g(A,) = A, (by the first theorem in §1.5). Thus for y with 
y* <n we can always find an x with x* <n” such that y = g(x), 
y* = g(x*). Then f(y) = f(y*) implies x = f(y) = /(y*) = x* and thus 
y= y*. 


1.6. Commutativity in Sums with More Than Two Terms 


By making use of segments, we can now prove the commutative law, 
stated above in §1.3, for sums with more than two terms: for every one-to- 
one mapping / of A,, onto itself we have 


(18) » ay) = yy a;. 
i=l L 


ti 
feos 


20 The existence of such a mapping follows from the axiom of choice in the theory 
of sets (see IA, §7.6). Here and below, MU M’ denotes the union of the sets M, M’ 
(that is, the set of elements which lie in M or M’) and {a} denotes the set consisting of 
the element a alone (cf. IA, §7.2). 

"1 Taken by Dedekind as the definition of “finite set.” 
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The proof by induction on v7 will only be indicated here; we confine 
ourselves to the case? f(k’) = n',k +h =n: 


hs ew 


n’ 
Y ayy) = Arti) ss 3 Ath’ 43) = 2 Aros) + Ay’ + : Gs (x +4) 
t=] 


i=1 


~. 


= 


= ar) + y Ag(n' 43) F An’ 


i=l t=1 


1 Bali) TB? = ya, + ay’ = se 


i=l 


| 
i Ms 


where the one-to-one mapping g of A,, onto itself is defined by 
sH=fO for ick, gk&+i)=fk' +i) for ich. 


For any finite index set J + 9, any sequence of numbers®® (a;),.; and 
any one-to-one mapping f of A, onto J, it is easy to see from (18) 
that 5°7_, a, does not depend on f but only on the given sequence, so 
that we can write this sum in the shorter form >),.; a; . From (9) we have 
in this notation** 


(19 YatYa=Ya, if T=rur, rar=9. 


iey’ iey” i€] 


For the case J’ = @ we define >°,.,, a; as 0, where 0 is the neutral element 
of addition (as defined below in §2.3); then it is obvious that (19) still 
holds. 


By complete induction on 7 we further obtain from (19) 


(0) } Ya=Ya, ifl=UO and LNL=9 forkfh. 


k=l i€l, ie] k=] 


22 The cases f(1) = n’ and f(n’) = n’ require only slight changes. 

23 That is, a mapping of J into N. The indices are not necessarily natural numbers; 
for example, we could also use pairs of numbers (i, k), in which case (provided there 
is no danger of misunderstanding) we may write a,, instead of a ;.,), and correspond- 
ingly for triples or n-tuples. 

In school one often introduces sequences without any mention of their connection 
with functions. But the concept of a function would be more clearly understood if 
infinite sequences were presented as mappings of N into N or into some set of numbers. 

24 14 ~\ M’ denotes the intersection of the sets M, M’, namely, the set of elements 
that belong to M and to M’; for the definition of U see footnote 20, page 103. (Cf. also 
IA, §7.2.) 
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If for J we take the set of pairs (i, k) with i < m, k <n, and for J, we 
first take the set of (i, k) with i < m, and then the set of (k, i) with i < n, 
we have 


(21) E Yan = Dan— FY ay. 


Setting a;, = 1 gives us (in view of ml = m, nl =n) nm = mn, the 
commutative law of multiplication. Of course, this law could also be 
proved by complete induction (first on n with m = | and then on m) 
but its derivation from the general equation (21) is shorter and corre- 
sponds exactly to the usual intuitive argument for the commutative law 
of multiplication: namely, the nm summands | are arranged in m lines 
and n columns and then added line by line. 


2. The Integers 


2.1. Properties Required in an Extension of the Concept of Number 


From now on the symbol a’ will no longer be used, as in §1, to denote the 
successor Of a, which will always be written in the form a + 1. If a, a’ 
are arbitrary natura] numbers, then in the case a < a’ there exists no 
natural number x with 


(22) x+a’ =a. 


But now we wish to proceed to a domain of numbers in which an equation 
(22) always has a solution.?° Let us assume that we have already succeeded 
in finding an extension of the domain of natural numbers in which addition 
is defined in such a way as to satisfy the laws (6), (7), (11) and, when 
applied to the natural numbers, to agree with the addition already defined 
for them. Of course, it will be necessary to prove later that this assumption 
is justified. But first let us reflect a little on the properties that such an 
extended domain must have, since in this way we will obtain valuable 
hints for the construction of the domain. 

Whenever we extend a domain of numbers, here and in similar situations 
below, we shall always require that certain rules of calculation remain 
valid, a requirement called the principle of permanence.** But this label 
should not mislead us into thinking that the principle of permanence 
justifies once and for all the assumption that such extensions exist. 
Moreover, it fails to tell us which rules of calculation are to be “preserved.” 


*° In school it is usual to begin with the requirement that the equation xa’ = a has 
a solution. For this procedure see the end of §3.2. 
26 Often associated with the name of H. Hankel. 
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It merely confirms the fact of experience that, in making the extensions of 
the concept of number which from time to time have become necessary, 
mathematicians have found it convenient to preserve the most important 
rules of calculation; whether this is possible, and to what extent, must be 
investigated in each special case. The principle of permanence gives us 
only a very weak indication of how we ought to proceed, and it by no 
means deserves the key position it has often been given, without any 
logical justification. 

If we denote by?’ a — a’ the solution of (22) in the extended domain, 
then 


(23) (a,a')—a-—a 


is a mapping of the set of all pairs of natural numbers onto the extended 
domain, and we must ask: when do two pairs (a, a’), (b, b’) have the same 
image in this mapping? From the equations (a — a’) + a’ =a and 
(b — b') + b’ = 5, which characterize a — a’ and b — J’, it follows that 


(6—b')+b+a=b4+(a-a)+a, 
so that 


(24) a+b =a +b 
means the same as b — b’ = a — a’. By (22) we must also set 
(25) c=(c+a’)—-a’. 
Finally, we have 
a+b +(a-a)+(6—-5)=a+b, 
and therefore 
(26) (a—a’)+ (6— 5) =(a+ b)— (a + 8). 


If now in the set of pairs we define addition by the (clearly associative 
and commutative) rule 


(a, a’) + (6, 5‘) = (a+ ba’ + 8’) 
and denote the mapping (23) by f, we can write (26) in the form 
(27) f(A) + f(B) = f(A + 8), 
where we have used capital letters for the pairs. 


27 This symbol is still completely at our disposal, since we have not used it before. 
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A mapping f with the property (27) (for all elements A, B of the set 
of preimages, consisting here of all pairs of natural numbers) is called a 
homomorphism (with respect to addition). 

Concerning the homomorphism (23), we have the following fact, which 
can be expressed in terms of the natural numbers alone: the pairs 
A = (a, a’), B = (6, b’) have the same image under / if and only if (24) 
holds. For (24) we also write A = B, since we shall see below that the 
relation = defined in this way has many properties in common with 
equality. 


2.2 Construction of the Extended Domain 


After these preliminary remarks we can see that the first part of our 
task consists of constructing, together with its set of images, a homomor- 
phism / of the set of pairs of natural numbers in sucha way that pairs A, B 
have the same image if and only if 4 = B, with = defined as above. The 
image of the pair A = (a, a’) is created by simply setting between the 
numbers a, a’ a horizontal stroke: a — a’. For the moment this stroke, 
which is now introduced for the first time, does not have the meaning 
of a minus sign, although it will naturally acquire that meaning later, 
when we have made the necessary definitions: at present a — a’ is nothing 
but a symbol formed from the two numbers a, a’.28 But now the real work 
begins, since the symbols a — a’ are completely useless until we have 
introduced for them the concepts of equality and addition. 

These concepts must be introduced in such a way that every statement 
about a — a’, b—0’,..., formed with the symbols = and +, is an 
abbreviation for a statement about the natural numbers a, a’, b, b’, ..., and 
in view of the fact that we shall be interested only in statements that could 
finally be reduced to = and +, the new symbols are in principle super- 
fluous, since they could be eliminated from every statement. But they 
provide us with a much more convenient notation, so that their use is to be 
recommended on practical grounds.2® As statements about the new 
symbols a — a’, b — b’,... we shall admit only statements about the 
natural numbers a, a’, b, b’,... that remain unchanged in truth-value 
(cf. the corresponding remarks in §3.1) when the a — a’, b — b’,... are 


*® It makes no difference here whether we regard the natural numbers as complicated 
logical expressions (sets of equivalent sets) or simply as symbols like | (see §1.1). 

*° Of course, we could dispense with these new symbols altogether and work merely 
with the pairs (a, a’), which would then be called integers and for which we would 
introduce = as the new relation of equality. But then there is the difficulty that we 
would like to use the ordinary symbol of equality, introduced before for the natural 
numbers, for the integers also; for pairs this symbol has already been used in a different 
sense (see IA, §7.2). The use of classes of pairs of numbers instead of the symbols 
a — a’ is discussed on p. 109. 
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replaced by any other symbols equal to them (in the sense of equality 
defined immediately below). 


It is to be noted that for the following developments we require only the 
properties (6), (7), (11) of the natural numbers and their addition. This fact 
will become important in §3.1, where the procedure described here is applied 
to multiplication instead of addition (with a/a’ instead of a — a’). Since 
in that section we shall be introducing only pairs (a, a’) with a’ + 0, we make 
the further remark here that in what follows (as can easily be seen from the 
proofs) the rule (11) is needed only for natural numbers c restricted to a proper 
subset C, provided that the pairs (a, a’) are restricted to a’ € C and for a’, b'E C 
we have a’ + b’ EC; the only exception is the proof of the existence of the 
inverse element at the beginning of §2.3, where we must also require a é C. 


It is now clear how equality is to be defined, in view of the requirement 
that if fis the mapping (23), then f(A) = f(B) must mean the same as 
A = B. This requirement is met if we stipulate that a — a’ = b — b’ 
means the same as (24). But now, if we wish to calculate with the new 
concept of equality in the same way as with equality for natural numbers, 
we must show that the two fundamental rules for equality are satisfied: 
namely, every expression must be equal to itself (reflexivity); and if 
each of two expressions is equal to a third, they must be equal to each 
other (comparativity). For the relation =, to which we have reduced our 
definition of equality, these fundamental rules mean that 


(28) A= A; 
(29) if A=C and B=C, then A=B. 


But (28) follows immediately from the fact that (24) holds for a = 5, 
a’ = b’. As for the proof of (29), we see that by adding 5b’ to 
a+c’=a'+c and a’ to b'’+c=b6-+c’ we obtain the equation 
a+b’+c’=a'+b-+c’, so that (24) now follows from (11). Only 
after (28) has been proved is it clear that (23) is actually a mapping: from 
(a, a’) = (6, b’) it follows that a — a’ = b— b’. 

We further note that from the definition of equality we have 
(a+d)—(a+d)=a- a. 

A relation = which satisfies (28) and (29) is called an equivalence 
relation (see also IA, §8.5). Such a relation is necessarily symmetric and 
transitive; for if C = A we see from (29) and (28) that B = A implies 
A = B; and if on the basis of this symmetry we replace B = C by 
C = Bin (29), we obtain the desired transitivity. From A = B, C = A, 
and B = D we obtain C = D by a twofold application of transitivity. By 
symmetry and by the definition of equality this result can also be expressed 
in the form: A statement a — a’ = b — 5b’ is not changed in truth value 
if a— a’ and b — Bb’ are replaced by their equals c — c’ and d — d’. 
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If to every A we assign the set 4 of all XY with X¥ + A, then A = B means 
the same as A = B; for by (28) we have 4&4, so that 4d — B immediately 
implies A = B; and conversely, if A = B, then Ye Aimplies Ye B by transitivity 
and Xe B implies X eA by (29). Thus, instead of the a — a’ we could simply 
use the A, which are called residue®® classes with respect to the equivalence 
relation. A special definition of equality is no longer required, since we have 
already given a general definition for equality of sets (see IA, §7.2). 

Under certain systems of logic the latter possibility appears to be of essential 
importance, but this fact does not permit us to conclude that the integers 
must be considered as sets of pairs. In fact, our construction of the symbols 
a — a’ corresponds more closely to the way in which the integers are actually 
used in daily life; when faced with an expression like 2 — 3, we seldom think 
of the set of all pairs (x, x’) such that (x, x’) = (2, 3). Moreover, such a set 
of pairs of natural numbers is in no sense “more real’ than our symbols: for 
this set of pairs can only be defined by the propositional form (x, x’) = (2, 3), 
where the variables x, x’ are quantified by some prefixed symbol (see IA, §7.7); 
in other words the set of pairs can only be defined by a symbol that is considerably 
more complicated than 2 — 3. The question ‘What is an integer?” has no 
absolute significance; it can be meaningfully asked only in the framework 
of a given system for the foundations of mathematics. The unconditionally 
meaningful question is: “How do we obtain mathematical objects that behave 
in such and such a way?” And to this question it is possible to give the most 
varied answers. 


In the domain of our new symbols a — a’, which henceforth we shall 
call integers, it is now our task to introduce addition in such a way that 
(27) is valid. At first glance (26) seems to constitute such a definition: 


(30) (a—a)+ 6—b')=(a+ b)—-(@ +5’). 


But the sum must actually depend only on the summands, whereas here 
it seems to depend on a, a’, b, b’; in other words: equals added to equals 
must give equals, or expressed still otherwise: the truth value of (30) must 
not be altered if the numbers occurring there are replaced by other 
numbers equal to them. Thus we must prove that 4 = C, B= D always 
implies A + B= C-+ D;* this condition, which alone makes the 
definition (30) useful, is called consistency of = with addition. For 
the proof it will be sufficient, on account of commutativity, to prove the 
simpler condition 


(31) A+B=C+B, if A4=C: 


for then from B = D we will have C + B= C + D, and therefore, by 
transitivity, 4+B+ C+D. But now addition of 6+ 56’ to 


*° The name comes from the use of this concept in the theory of divisibility (IB5, 
§3.6 and 1B6, §4.1). For general information on equivalence relations and their sets of 
residue classes, see 1B10, §1.5. 

3. Cf. 1A, §8.5, and particularly IA (8.9). 
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a+c’=a'+cgives(a+6)+ (+ 8) = (a + Bb’) + (c + 3D), which 
proves (31). The addition of integers defined by (30) is obviously commu- 
tative and associative. 

Now we must see to it that certain integers are equal to natural numbers. 
To do this we define equality between integers and natural numbers, 
in accordance with (25), by setting:®? 


(32) (c+a')—a =c, c=(c+a)—-a, 


with the stipulation that these equations, and only these, are to hold 
between natural numbers and integers. We must now verify that the two 
fundamental laws for equality are still valid. Reflexivity remains unaffected 
by (32), but for the comparativity “if « = y and B = y, then « = 8,” 
we must distinguish the various cases arising from the fact that each of the 
letters a, 8, y may represent either a natural number or an integer. Of the 
eight possible cases we no longer need to examine those in which «, f, y are 
of the same kind. In view of the symmetry (32) of equality, we can also 
strike out those cases that arise from others if « is replaced by 8. Thus the 
following four cases remain: 


1. B, y are the integers b — b’, c — c’, and « is the natural number a. 
The assumption « = y can be satisfied only on the basis of (32) and thus 
implies c = a + c’. Consequently, 8 = y implies b+ c’ = Bb’ +a+c’ 
and also, by (11), 6 = b’ + a, which finally, by (32), gives a = b — b’, 
and therefore a = 8. 

2. «, 8 are the integers a — a’, b — b’, and y is the natural number c. 
Then by (32), « = y, 8 = y imply a=c+a',b=c-+ B’, from which 
follows a + b’ = a’ + b, and therefore « = B. 

3. x, 8 are the natural numbers a, b, and y is the integer c. Then a = y, 
B = y imply by (32) that c=a+c’,c=6-+c’, from which we see 
thata + c’ = b+ c’,so that finally, by (11), we have a = band therefore 
cp; 

4, «, y are the natural numbers a, c, and 8 is the integer b — b’. Then 
by (32), 8 = y implies b = c + b’ and therefore, since « = y, we have 
b = a+’, which by (32) gives a = 6 — b’ and thereforex = B. 


Thus we may in fact consider the domain of the integers as an extension 
of the domain of the natural numbers, since every natural number is 
actually equal to an integer. But now a new difficulty arises: the sum of 
two natural numbers a, 5 can be determined in two different ways, namely, 
first as a + b and secondly as the integer ((a + c) — c) + ((b + d) — d). 


32 In (32) it is necessary to adjoin the second equation in order that equality may be 
symmetric. 
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But by (30) the latter number is equal to (a + 6 + c+ d) — (ec + d), and 
therefore again to a + b. Thus everything is in order. 

In the domain of the integers it is now true that every equation (22) has 
a solution, namely the integer a — a’: 


(a—a)+a@ =(a—-a@)+(@+c)—c)=(at+a+c)—-(a+0)=a. 
In §2.3 we shall see that addition of integers actually has the property (11). 


2.3. The Module of the Integers ° 

From the definition of equality (24) for the integers it follows at once 
that all integers of the form a — a are equal. If for 1 — | we introduce 
the abbreviation 0, then 


(33) a—a=0, 


and therefore 0 + a = a for every natural number a. But this last result 
also holds for every integer a — a’; for by (30) we have 


(a—a@)+0=(@-a)+0-)D=@4+)-@+)D=a-a. 


Thus we have introduced the number zero and have established its most 
important property. From (30), (33) we also have 


(a—a)+(a —a)=(a+a’)-(’+a)=0. 


Now a given set, together with an operation defined in it (see [B10, 
§1.2.2), is called a module®® if (the operation being denoted by +) the 
following conditions are satisfied: 


1. The operation is associative and commutative; i.e., we have the 
equations (a + 8B) + y=a+(8+ y)and «+ 8B = B+ a for all ele- 
ments «, B, y of the set. 

2. There exists in the set a neutral element for the operation, namely 
an element 0 with « + 0 = a for every element « of the set. 


3. For every element a of the set there exists an inverse element,*4 
i.e., an element —«a of the set with « + (—«) = 0. 


The set is said to be a module with respect to the operation (which is here 
written as addition). 


33 Or also a commutative (or Abelian) group; (cf. TB2, §1.1). 

*4 Or also simply an inverse; the name comes from the fact that the effect of adding 
—a reverses that of adding a: (8 + «) + (—a) = 8 + (a + (—a)) = Bg. The con- 
nection between the notation —« and the use of the “minus” stroke in a — a’ will be 
explained later. 
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Consequently, the integers form a module under the operation of 
addition, namely, the module of integers. The following remarks are 
valid for every module;*5 here we may, of course, keep in mind the module 
of integers that has just been constructed, but we must be careful not to 
make use of any of its properties that are not common to all modules, 
since we wish to apply our results to other modules. 

For given elements a, 8 in a module there exists exactly one element & 
with € + a = B. 


§=F+0=F+ (a+ (—a)) = € + a) + (—2) = 8 + (—9), 


so that at most one element &, namely 8 + (—«), can satisfy the equation; 
on the other hand, for € = B + (—«) we have 


E+a=(8 +(—-a))+«e=8+(-a) +o =8+0=8. 


Thus rule (11) holds for every module: from « + y = 8 + y it follows 
that « = f. For the unique solution € = B + (—a) of the equation 
& + a = Bwewrite 8 — «, which agrees with the notation for the integers, 
since a — a’ is the solution of equation (22). Conversely, this abbreviation 
can be used to define the inverse: —x = 0 — a(=0 + (—«)). Thus the 
minus sign is used in two closely related senses: first in 8 — « as a connec- 
tive, i.e., as a notation for a function of two arguments; and second in 
—a for a function of one argument, namely the function which to each 
element assigns its inverse. 

From the uniqueness of the solution of € + « = B it also follows that 
—a is already completely determined by « and that the equation « + 
(—a) = 0 [as wellas(—«) + a = O]isestablished. Thus we have at once 


(34) —(~—a) = « 


and also, since (« + B) + ((—8) + (—a)) = «+ B +(—8) + (—2) = 
a + 0 + (—a) = a + (—a) = 0, we may write 


(35) —(« + B) = (—8) + (—2). 
Setting —B for B we have, by (34), 
(36) —(a — B) = B—«. 


Let us now return to the integers. If a’ <a, there exists a natural 
number b with a = a’ + b, so that by (32) we have a — a’ = b and as 


85 In fact, for every group (cf. [B2, §2). 
86 The order of the summands on the right-hand side is so chosen that this result is 
obviously valid without the assumption of commutativity. 
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a result we may simply say: a — a’ is a natural number. If a’ = a, then 
a — a’ = 0, and in the only remaining case, namely a < a’ [see (15)], 
it follows from (36) that a — a’ is the inverse —b of a natural number 5. 
For natural numbers 5, b’ we have b +0, —b £0 (since b+ 0 40), 
and b ~ —b’ (since b + b’ 40). Consequently, there exist exactly three 
kinds of integers: the natural numbers, zero, and the inverses of the 
natural numbers. The latter are called negative integers (or numbers of 
negative sign®’), and then, in contrast, the natural numbers are called 
positive integers. 


In view of the above remarks, it would also be possible to extend the set of 
natural numbers to the module of integers in the following way: For every 
natural number 7 we introduce a new symbol —n, and also the new symbol 0, 
for which —n = —m is defined as n = m and there are no other equalities 
except 0 = 0. Addition is then defined as follows: 


0+0=0, 0+m=m+0=>nm, 0 + (—n) = (-n) +0 = —-n, 
(—m) + (~n) = —(m + n), 


m+(—-n =(-nt+me=k, with n+k=m for n<m, 


m+(—n) = (—n)+m= —k, with m+k=n for m<n, 
m+ (—m) = (—-m +m =0. 


This procedure is conceptually much simpler but has two serious disadvantages: 
proofs of the rules for calculation must be divided up into many special cases 
and thus become much lengthier, and addition must be required to satisfy 
not only (6), (7), (11) but also (12), (15), which, in contrast to (13), (14), cannot 
be deduced** from (6), (7), (11) alone. 


2.4. Multiplication 

In order to define multiplication, we adopt a plan which may at first 
sight seem like a detour but has essential advantages over other methods.®° 
Our task is to define multiplication of integers in such a way that it 
satisfies the distributive law and, in the subdomain of the natural numbers, 
agrees with multiplication as already defined. The distributive law 
a(x + y) = ax + ay will be regarded as a property of multiplication 
by a; that is, as a property of the mapping x — ax. So let us first examine 
mappings with this property [see (38)]. We begin with a discussion of 
multiplication of the natural numbers from this point of view. 


37 But this terminology readily gives rise to the common error that —a (for an arbitrary 
integer a) is always a negative integer. 

38 Thus we cannot use this procedure in §3.1. 

3° Two other possible procedures are described on p. 119 (in small print). 
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By the distributive law (10) and the commutative law as proved in §1.6, 
the mapping / of the set of natural numbers into itself defined by 


(37) f(x) = ax 
has the property 
(38) f(x + y) =f) +f) 


and is therefore a homomorphism with respect to addition. If fand g are 
two such homomorphisms, it follows from f(1) = g(1) that f(x) = g(x) 
holds for all natural numbers” x, and thus f = g; for f(x) = g(x) implies, 
since (38) holds for g in place of f, that f(x + 1) = f(x) + f()) = 
g(x) + g(1) = g(x + 1). Thus 


(39) f—>fQ) 


is a One-to-one mapping of the set of homomorphisms in question onto 
the set of natural numbers, so that in particular each of these homomor- 
phisms has the form (37): 


(40) f(x) = fl) x. 


Thus we have obtained a description of multiplication which is very 
suitable for extension to the domain of integers. In what follows, lower- 
case italic letters will refer to arbitrary integers or, when the argument is 
applicable to modules in general, to arbitrary elements of a given module. 

Homomorphisms (with respect to addition) of a module &M into itself 
are called endomorphisms of the module. Since we wish to use these 
endomorphisms, as suggested by (40), in defining multiplication for the 
module of integers, let us first examine in a general way the set of 
endomorphisms of a module. 

If f and g are endomorphisms of the module M, the mapping 
x — f(x) + g(x) is also an endomorphism: for 


f(x + y) + g(x + y) = f(x) + f(y) + g(x) + gy) 
= (f(x) + g(x) + GO) + f0)). 


This mapping is called the sum f+ g of the endomorphisms. We now 

show that with this definition of addition the endomorphisms themselves 

form a module. To begin with, associativity and commutativity are clear 

at once. The neutral element is the endomorphism x —>0, which we 

denote by O(f + O)(x) = f(x) + O(%) = f(x) + 0 = f(x).*! When there 
*° No use is made here of the fact that f(x) and g(x) are natural numbers. 


41 Of course, an expression like (f + g)(x) does not denote any sort of product but 
rather the value of the function f + g for the argument +x. 
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is no danger of confusion with the number 0, we may write 0 instead of 0. 
Finally, if f is an endomorphism, then by (35) the mapping x > —f(x) 
is also an endomorphism, which is denoted by —/f and is the inverse of /: 
Cf + (AN) = FO) + (AO) = FO) + (FO) = 0 = O()." 

But now, given two endomorphisms, it is possible to define not only 
their addition but also another operation on them, namely, successive 
application: the mapping denoted* by fo g and defined by 


(fo g(x) = f(g(x)) = forall xeM 


is again an endomorphism for given endomorphisms f and g; for we have 
(fo g(x + ¥) = f(g + vy) = f(g) + g()) 


= f(g(x)) + f(g) = (f° a)(x) 
+ (fo g(x) + (f° g)Q). 


The operation © will be called multiplication. Like every case of successive 
application of two mappings (cf. IB2, §1.2.5) this multiplication is 
associative: 


(fo g) 0 AY(x) = (fo g)h(x)) = fle A@))), 
(fo (g 0 AMX) = f(g © AY) = fle AQ) 
for all x and thus 
(fo g)oh=fo (go). 


But it also satisfies the two‘? distributive laws with respect to addition 


fo(gth=fogtfoh, (ft+gyoh=foh+gohk; 


since 


(fo (g + Ax) = f(g + AV) = f(g) + AQ) 
= (fo g(x) + (fo AY) = (fog + fo Al(x), 


(f+ 8) o AMX) = (F + g(x) = (fo A(x) + (g 0 A)(x) 
=(foht+gohyx). 


42 The symbol f© g may be read as ‘“‘fafter g”’ or also ““f times g.”’ The small circle 
is often omitted but we will retain it here in order to emphasize the difference between 
this operation and the operation defined by ( fg)(x) = f(x) g(x) in case a multiplication 
has been defined in the set of images. If, as is often done, we write the mapping to the 
right of the object to be mapped, namely, xf or x’ instead of f(x), then the definition 
of successive application is changed, to the effect that {© g denotes the application first 
of fand then of g [see IB2, §1.2.5 (4)]. 

43 Since multiplication of endomorphisms of an arbitrary module is not necessarily 
commutative (the noncommutative linear mappings in IB3, §2.2 are endomorphisms), 
these two laws must here be proved separately. 
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By a ring we mean a set of elements with a pair of operations that 
have the following properties: 


1. With respect to the first operation, called addition, the ring is a 
module. 


2. The second operation, called multiplication, obeys the associative 
law, and also the two distributive laws with respect to addition. 


Thus the above results on the endomorphisms of a module can be 
summarized in words: the set of endomorphisms of a module forms a ring 
with respect to addition and multiplication. This ring is called the ring of 
endomorphisms of the module. It has a neutral element for multiplication, 
namely the identity mapping x — x, which we shall denote by /:# 


To f(x) = 14) =f), (fo D&) = FU) = f@). 


A neutral element for multiplication in a ring (there exists at most one 
such element; see IBS, §1.6) is called the unit element of the ring. 

Let us now return to the module of integers. The mapping (39) is seen 
to be a one-to-one mapping of the set of endomorphisms onto this module. 
For, as was proved at the beginning of this section, the equation 
J(x) = g(x) follows from f(1) = g(1) for all natural numbers x. But 
from (38) we have f(0) = f(0) + f(0), and thus f(0) = 0 and further 
S(x) + f(—x) = f(0) = 0; consequently, /(—x) = —f(x) and _ also 
g(0) = 0, g(—x) = —g(x), so that f(x) = g(x) holds for all integers. 
Thus the endomorphism f is already completely determined by the value 
of f(1): the mapping (39) is one-to-one and can therefore be inverted. 

The set of images in (39) or, in other words, the set of numbers /(1) 
for all endomorphisms f, includes the number | (as the image of | in J). 
If it includes f(1), it also includes f(1) + 1 = (f+ D1); therefore it 
includes all the natural numbers. In fact, 0 = O(1) and —f(1) = (—f)()); 
this set includes all the integers. Consequently, (39) is in fact a mapping 
onto the module of integers. 

Thus we may use, (40) to define multiplication for all integers in such 
a way that for the domain of natural numbers it agrees with the multipli- 
cation already defined at the beginning of this section: the product ax is 
defined as the image of x under that endomorphism which takes | into a. 
The rules for calculating with multiplication can now be obtained very 
simply from the properties of the ring of endomorphisms; since (39) is a 
homomorphism with respect to addition and multiplication, f+ g 
becomes (f+ g)(1) = f(1) + g(I) and fo g becomes (fo g)(1) = f(1) g(1). 


44 If there is no danger of confusion with the number 1, we may also write | instead 
of 7. 
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Thus the associative law and the two distributive laws*® carry over to the 
integers: 


FOE) AC) = FDC o AAD) = (Fo (g 0 AM) = (fo g) 0 AM) 
= (Fo gM) AC) = FC) g()) AC), 


(40) + g()) AC) = (4+ gM) AC) = (4 + 8) 0 AM) 
= (foh+ goh)(l) 
= (f0 AMI) + (g 9 AM) = fC) AC) + 2 (1) AC). 


Thus the integers form a ring with respect to addition and multiplication. 
Since by (39) the endomorphism J corresponds to the number 1, this 
number | is the unit element of the ring of integers: 


If) = 1) fC) = To f)) = fC). 
FU) = f(D IC) = (fo DD) = fC. 


By an isomorphism we mean a one-to-one homomorphism with respect 
to the operations in question (for a ring, addition and multiplication). 
Since (39) was shown to be one-to-one, we have the theorem: the mapping 
(39) is an isomorphism of the ring of endomorphisms of the module of 
integers onto the ring of integers. 

In order to prove the commutativity of multiplication, we must note 
that by the second distributive law x — xa is an endomorphism: for in 
fact, the image (x + yja = xa + ya of x + y is the sum of the images 
of x and y. Application of (40) to this endomorphism gives xa = (la)x = ax, 
so that the ring of integers is a commutative ring. 

In an arbitrary ring (for which we denote multiplication in the same 
way as for the numbers) complete induction on n enables us to generalize 
the distributive laws to 

n n n n 
(Ya) b= ¥ ad, a) b; = ¥ ab, 

i=1 = i=l 


é=1 


Yay a=> (ay) => S aides 


45 Because of the commutative law (to be proved later) only the second of the two 
distributive laws needs to be proved for the integers. In any case, the first distributive 
law is a simple consequence of (30) and (40). 
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and thus by (21) 


(41) y a; > b, = y a,b, . 
i=l k=1 l<i<m 
1<k<n 


m m 
/ 
Il ik = Le 
(41 ) Qix y I] ain ; 
i=1 k=l lekjcny t=1 
(i=1,...,m) 


where the index set for the summation on the right-hand side is the set of 
m-tuples (k,, .... km), with | <k; <n, @ = 1,..., m). 
From the distributive law we further have 


a(c — d)+ ad =a((c — d)+ da) = ac, 
(a — b)c + be = (a — 5) + Bc = ac 


and thus, 
(42) a(c — d) = ac — ad, 
(42’) (a — b)c = ac — be. 


Replacing a by a — b in (42), we see from (42’) that 

(a — b\(c — d) = (a — b)c — (a — b)d = ac — be + (—(ad — bd)), 
and therefore by (36) and (35) 
(43) (a@ — b)(c — d) = ac — be + bd — ad = (ac + bd) — (ad + be). 


Setting c= d and a=b in (42) and (42’) respectively, we obtain 
a0 = 0 = 0c, so that (42), (42’), (43) give 


(43) a(—d)=—ad, (—b)e=—be,  (—b)(—d) = bd. 


In particular, we have —a = (—l)a = a(—1) if the ring has a unit 
element, which for simplicity we have here denoted by 1. 

Since the product of two natural numbers is always a natural number, 
the equations (43’), when applied to the ring of integers, show that the 
product of a positive with a negative number is a negative number and 
that the product of two negative numbers is a positive number. But an 


46 The definition of a product of several factors is given at the beginning of §3.3. 
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integer 0 is always either positive or negative; thus ab + 0, if a,b 40; 
or in other words: 


Ifab = 0, then a=0O or b=0., 


A ring with this property is said to have no divisors of zero. Thus the ring 
of integers is recognized as a commutative ring with unit element that has no 
divisors of zero.” 


Of course, we could also define multiplication of integers by setting, from (43’), 
(—”)m = m(—n) = —mn, (—m)(—n) = mn, 
m0 =- Om = 0, (—n)0 = 0(—n) = 0, 00 — 0, 


for the natural numbers m,n. But then the proof of the rules for calculation 
involves many special cases. 

Multiplication for the integers (in the form in which they have been introduced 
here) could also be defined by setting, from (43), 


(a, a’)(b, b’) = (ab + a’b’, ab’ + a’b) 


as a multiplication for the pairs of natural numbers and then transferring this 
multiplication to the integers by means of the mapping (a, a’) >a — a’. 
Of course, it would then be necessary to show that multiplication of pairs is 
consistent with the equivalence relation =. (of p. 107), but at least we would 
escape the disadvantage of having many special cases. 


In comparison with these two possibilities, the procedure adopted 
above has the advantage of being independent of the sequence in which 
the integers are introduced, and secondly of not assuming that multiplica- 
tion of the natural numbers has already been defined; it is true that we used 
this multiplication to give us a hint (40) on how to proceed, but the 
subsequent proofs were independent of it. The endomorphisms by 
means of which we have introduced multiplication for the integers will be 
useful to us again in §4.6 for the multiplication of real numbers. Finally, 
let us remark that from a general point of view the concept of the ring of 
endomorphisms of a module is of great importance in algebra. 


2.5. Order 


For the time being we denote the set N of natural numbers by P. Then 
by what has been proved before we have 


(44,) O¢ P; 


*7 A commutative ring without divisors of zero is also called an integral domain; cf. 
also IBS, §1.9. 

** Except for the fact that the product of two natural numbers is again a natural 
number; but this fact could easily have been proved by complete induction on the basis 
of the definition (40) (see p. 116). 
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(44,) if Of4a¢€P, then —aeP; 
(445) if a, be P, then a+ beP; 
(44,) if a,beP, then abe P. 


In general, in a module (with the operation +) a subset P with the 
properties (44,_;) is called a domain of positivity; in a ring a domain of 
positivity must satisfy the additional condition (44,). From (44,3) it 
follows at once that a, —aé P is impossible. 

The ring of integers has the set N of natural numbers as its only domain 
of positivity. For —1 € P would imply from (44,) that 1 = (—1)(—l) € P, 
which is inconsistent with —1 € P; thus, by (44,) we have 1 = —(—l)eéP. 
Then complete induction shows at once from (44,) that N C P. Now if 
there were an ae P with a¢ N, then [since a + 0 by (44,)] we would have 
—aeéN, and thus —aeP, which is again impossible; thus we have 
proved that VN = P. On the other hand, the module of integers has exactly 
one other domain of positivity; for by what we have just proved, the 
domain of positivity of the module must be equal to A if it includes 1, 
and otherwise it must include —1 and therefore all the negative numbers, 
from which we conclude as before that it must coincide with the set of 
negative numbers. 

The existence of a domain of positivity P enables us to define an 
ordering in a module: for we may set a < b (or equivalently, b > a) if 
and only if 6 — ae P. For the module of integers with P = N this order 
obviously agrees with the order defined for the natural numbers in §1.4. 
Thus a module or ring with a domain of positivity is called an ordered 
module or ring. In an ordered module we can again prove (12)—(15): for 
(12) follows from (44,) since a—a=0; and (13) from (44,) since 
c—a = (c — 5b) + (6 — a); also (15) follows from (44,) because of (36); 
and finally, (14) from (6+ d)—(a+d)=bid—d—a=b~—a. 
Since a <b means the same as b — ae P and b — a = (—a) — (—J), 
it follows from a <b that —b < —a; in other words, in an ordered 
module the mapping x — —x [which by (35) is an endomorphism) is 
monotone decreasing. In an ordered ring we also have, by (42’) and (444), 
the monotonic law for multiplication.” 


(45) ac < be, if a<b and 0<e. 


Conversely, a module or a ring in which a relation < is defined becomes 
an ordered module or ring if the conditions (12) to (15) (and for a ring 
also (45)) are satisfied by the relation. For the proof of this statement we 


4° If the multiplication is not commutative, there is a second law of the same sort, 
with the factor c on the left. 
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define P as the set of x with 0 < x. For then (44,) follows from (12), and 
(44,) is proved as follows: 0 4 a ¢ P implies by (15) that a < 0, and thus 
by (14) we have 0 = a + (—a) < —a, and therefore —ae P. As for 
(44,), we see that 0 <a, 0 <b imply by (14) that a<a-+b and 
consequently by (13) that 0 <a + 5. Finally (44,) is obtained from (45) 
with a = 0. Since by (14) a < b means the same as 0 < b — a, the relation 
< actually results in this way from P and is therefore the ordering that 
corresponds to the domain of positivity P. 

The assumption 0 < c in (45) is essential; for if c <0, then 0 < —c, 
so that —ac = a(—c) < b(—c) = —bc and therefore bc < ac; of course 
for c = 0 we have ac = bc (=0). This argument shows that for c 40 
the endomorphism x — xc of the module of integers (by §2.4 every 
endomorphism is of this form) is a monotone and therefore one-to- 
one mapping*° (monotone increasing for c > 0 and monotone decreasing 
for c <0), whereas for c = 0 the endomorphism is not one-to-one. 

A twofold application of the monotonic law (14) for addition shows that 
ifa<bandc<d,thena+c<b+c¢,b+c<b6+4d and therefore 
by (13)a + ¢ <6 + d; thus, inequalities of the same kind may be added. 
To obtain the same result for multiplication, we must make the additional 
assumption that b, c > 0 or a, d > 0; since otherwise (45) would not be 
applicable. 


3. The Rational Numbers 


3.1. Introduction of the Rational Numbers 


But now the integers are incomplete with respect to multiplication in the 
same way as the natural numbers were incomplete with respect to addition: 
not every equation xa’ = a has a solution. However, multiplication is 
associative and commutative in exactly the same way as addition, and 
as a substitute for (11) we at least have: if ac = beandc £0, thena = b, 
as follows from ac — bc = (a — b)c because of the absence of divisors 
of zero®*! (see §2.4). By the remark in small print at the beginning of §2.2 
(for C we take the set of integers =40) these properties permit us to follow 
the construction given there for extending a domain, provided we restrict 
ourselves to pairs (a, a’) with a’ ~ 0. Instead of a — a’ we naturally use 
the symbol a/a’ and replace addition by multiplication. Essential for our 
present purpose is the following fact: in the product (a, a’)(b, b’) = (ab, a’b’), 
defined in analogy with (30), the second number a’b’ of the pair is also 


°° Of course, unless c = 1 or c = —1, not every integer will be an image in this 
mapping; we will have an isomorphism of the module onto a submodule. 

51 Consequently, the procedure we are about to describe may be carried out for any 
arbitrary commutative ring that has no divisors of zero. 
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0 because of the absence of divisors of zero. For the rational numbers 
a/a’ obtained in this way (they are also called fractions) the equation 
aja’ = b/b’ means the same as ab’ = a’b, so that in particular 
ac/a'c = afa’ for c 4 0. 

Just as for the integers in §2.2, we now admit as statements about the 
rational numbers a/a’, b/b’, ... only such statements about the integers 
a, a’, b, b’,... as do not change their truth values when a/a’, b/b’ are 
replaced by rational numbers that are equal to them (in the sense of 
equality just defined). If, as is customary, we call a’ the denominator of the 
symbol a/a’, then the statement ‘‘a/a’ has the denominator a’”’ is not an 
admissible statement about the rational numbers since, for example, 2/3 
has the denominator 3 but 4/6 does not. Of course, there is no serious 
objection to such statements, even though they are not “‘equality-invariant,” 
provided we clearly understand their special position. But since it is 
possible to avoid them altogether in mathematics, we will find it safer 
and more convenient to exclude them on principle. If we do this, we can 
still speak about the denominator a’ of the rational number a/a’ if we 
assume, for example, that a’ > 0, a0 and a, a’ are relatively prime 
(see IB6, §2.6). 


On the other hand, it is customary to speak in school about the numerator 
and the denominator of a fraction a/a’ even when a and a’ have common factors. 
In this case (on account of the order in which the extensions are usually made in 
school) a and a’ are natural numbers. Moreover, it is a common habit to say 
that fractions are equal only if they have the same numerator and the same 
denominator. In this terminology the fractions are actually the pairs of numbers 
(a, a’).** Then to obtain the rational numbers one says that the fractions a/a’, 
b/b’ are equal in value if ab’ = a’b. This equality of value is our equivalence 
relation (denoted by = in the analogous developments in §2.2). For the rational 
numbers one then uses the same symbol a/a’ as for fractions, but equality is 
taken in the sense of equality of value. In contrast to our procedure, in which 
pairs of numbers are denoted by symbols different from those for rational 
numbers, the distinction between fractions and rational numbers is now taken 
into account only in the different concepts of equality. This procedure, which is 
permissible enough in itself, is obscured by the fact that only equality of value 
actually appears in the formulas, so that the symbol ‘‘=”’ always means equality 
of value, whereas the ‘“‘original’’ equality of fractions (equality of numerator 
and denominator) occurs only in informal statements. Here again (in analogy 
with the use of residue classes or of the equivalence relation -= mentioned 
on p. 109) we may consider a rational number as the set of fractions that are 
equal in value to a given fraction a/a’; then the rational number is represented 
by the fraction a/a’ (or by any other fraction with the same value), and equality 
of value of fractions means exactly the same as equality of the rational numbers 
represented by them (in the sense of the definition of equality for sets; 
see IA, §7.2). 


52 See also Vogel [1]. 
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The equations (32) now become ca’/a’ = c, c = ca'/a’, and we have 
(a/a’)(b/b’) = ab/a'b’. Instead of the 0 in §2.3 we obtain for any number 
a +0 the fraction a/a as the neutral element for the multiplication of 
rational numbers; but now, since a/a = 1, this element is not a newly 
adjoined element but simply the number 1, namely the unit element of 
the ring of integers. If a, a’ 4 0, then a’/a is the inverse of a/a’; it is also 
written (a/a’)-! and is called the reciprocal of a/a’. Thus the rational 
numbers +0 form a module®® with respect to multiplication, and for 
given rational numbers «a, 8 with «a + 0 there exists exactly one rational 
number € with a = 8. In agreement with our use up to now of the 
solidus /, this number is denoted by B/a«. In analogy with (34), (35), (36) 
we have the equations (a—!)-! = a, («B)~! = Ba}, (a/B)-! = B/a for 
all «, B #90. 

It must be noted that the remarks at the end of §2.3 have no analogy 
in the theory of multiplication: it is not true that every noninteger is the 
reciprocal of an integer. The explanation is that at that time (end of §2.3) 
we made use of the properties (12), (13) of the relation <. In analogy with 
< we now define the relation | (to be read: factor of) in the domain of 
integers: a| 5 if and only if there exists an integer c with ac = b. Then 
the statements analogous to (13), (14) are valid: from a|b, b | c follows 
a|c; from a|b follows ad | bd. But in contrast to (12) we always have 
a|a, and in contrast to (15) neither 2 | 3 nor 2 = 3 nor 3 | 2.54 


3.2. The Field of Rational Numbers 


But how shall we define addition for the rational numbers introduced 
above? It is natural to lay down the following three requirements: when 
applied to the integers, the new addition gives the same results as the old; 
under the new addition the rational numbers form a module; and finally, 
multiplication is distributive with respect to addition. If we examine the 
equations €a’ = a, nb’ = b (where a, b are integers, and &, 7 are rational 
numbers), it follows from these requirements that: 


(E+ y)a’b’ = ab’ + ba’. 


For the pairs A = (a,a’), B = (0, b’) it will be natural in the present 
context (in contrast to §2.2) to define addition as follows: 


A+ B= (ab’ + a’b,a’'b’). 


°3 When speaking of multiplication, it is customary to use the term “commutative 
group” rather than “module.” 

4 In the terminology introduced in IA, §8.3 the relation < is an ordering, whereas | is 
Only a partial ordering, and even then only under the restriction to natural numbers 
(see IB6, §2.2). The relation | is defined in any commutative ring without divisors of 
zero (cf. IBS, §2.1). 
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This addition is applicable to all pairs and is commutative. In order to 
carry it over to the rational numbers by means of the mapping (a, a’) > a/a’ 
we must show, in accordance with the remarks in §2.2, that the equivalence 
relation = is consistent with it, where 4 = B now means ab’ = a’b. 
To do this, we prove (31) with the new meaning of =, namely that from 
A = C, orin other words from ac’ = a’c, it follows that (ab’ + a’b) b’c’ 
= (cb’ + c’b)a’'b’, so that A+ B=C+B. After this proof of con- 
sistency we may define addition of the rational numbers by 


(46) ala’ + b/b’ = (ab’ + a’'b)/a'd’. 


Since a = a/l, the new addition agrees with the old for integers. 
Commutativity of the new addition is clear, and its associativity is easily 
shown as follows: 


(a/a’ + b/b’) + e/e’ = (ab’ + a’b)/a'b’ + e/e’ 
= (ab'c’ + abc’ + a'b'c){/a'b'c’, 
ala’ + (b/b’ + e/c’) = ala’ + (bc’ + b’c)/b'c’ 
= (ab’c’ + a'bc’ + a’'b'c)/a'b’c’. 
Since a/a’ + 0 = aja’ + O/a’ = (a + 0)/a’ = a/a' and a/a’ + (—a)/a’ = 


(a + (—a))/a’ = O/a’ = 0, the rational numbers form a module with 
respect to addition. Furthermore, 


(ala’)(c/c’) + (b/b’(e/e’) = acla'c’ + belb’c’ = (ab’c + a’be)c’Ja'b'c'c’ 
= (ab’ + a’b)cla’b’c’ = (ala’ + b/b’)(clc’). 


Consequently, the distributive law holds and therefore the rational 
numbers form a commutative ring with respect to addition and multi- 
plication. But this ring has the special property that the nonzero elements 
in it form a module with respect to multiplication, so that every equation 
€« = B (a ~ 0) has a solution. 

Now a field is defined as a ring with the following properties: 


]. For arbitrary elements a, 8 of the ring with « 4 0, there exists an 
element & in the ring with « = B. 

2. Multiplication is commutative. 
Thus by what has just been proved the rational numbers form a field, the 
field of rational numbers. The only property of the integers used in the 
construction of this field is that they form a commutative ring without 
divisors of zero.®> Every such ring R can therefore be extended to a field 


55 The occasional use of the unit element 1 in the ring of integers could easily have 
been avoided. 
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that consists, as in the case of the rational numbers, of the symbols a/a’ 
with a, a’ eé R, a’ 4 0; here the a/a’ are called quotients and the field is 
called the quotient field of R. A field cannot have divisors of zero, since it 
follows from «8 = 0 and « 40 that B = (a7!a)B = a-1(aB) = a0 = 0. 


We could have obtained the field of rational numbers by introducing 
subtraction and division in the opposite order; beginning with the natural 
numbers we would then have defined the positive rational numbers (cf. §3.4) 
in the form a/a’ (a, a’ being natural numbers); in this domain we would introduce 
addition as before and then apply to it the procedure which led from the natural 
numbers to the integers. In this case the rational numbers appear in the 
form a/a’ — b/b’. 

Our chief reason for not adopting this procedure is that then the ring of 
integers, which is of great importance in algebra, does not appear as an inter- 
mediate stage. Of course, such an objection does not mean that the procedure 
may not be otherwise convenient. For example, it is usually adopted in school. 


3.3. Powers 
In the present section, except where otherwise mentioned, lower-case 
italicized letters denote the elements of an arbitrary field.°* With multi- 


plication in place of addition and J] in place of }* we can again use the 
definition (8): 


1 n+1 n 
I] a; = @,, [I] a = (T] a) Qni, (na natural number). 
j=l i=l é=l 


Since the properties of addition used in §1.3 also hold for multiplication, 
we have the result corresponding to (9): 


n m n+m 
T] a: [] @nus = [] a: (, m natural numbers). 
t=1 i=l i=l 


Now we shall call []?_, a the nth power a” of a. From (10) we have for 
natural m,n 


(47) a"a™ = antm, 


We may also take over (18) and thus introduce [],.; a; . Corresponding 
to (21) we get the equation 


(48) [] [] an = [] an = [] [] ax (m,n natural numbers). 


k=1 i=l i<m é=1 kel 
kgn 


58 As far as positive exponents are concerned, the developments are valid in an 
arbitrary commutative ring, or even in an arbitrary (multiplicatively written) Abelian 
group, where commutativity is needed only for the proof of (48) and (50). 
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With a, = a we have 
(49) a= ™™ (m, n natural numbers) 


since from (21), with a;, = 1, there exist exactly mn pairs of natural 
numbers (i, k) with i < m, k <n. For m = 2 and a, = a, aj, = b we 
have from (48) 


(50) (ab)" = arb". 


This definition of a power is now extended by the convention a® = 1, 
where it is often assumed that a 0. The value of a° for a 4 0 must be 
defined in exactly this way if (47) is to remain valid: namely, ata® = a1 and 
thus aa® = a. Then it is easy toshow bycalculation that not only(47) butalso 
(49), (50) hold for all integers m, n > 0. If we wish to define a for every 
natural number 7 in such a way that (47) remains valid, we must have 
a"a-" = a = |, which shows that a 4 0 is a necessary restriction. Thus 
a-” = (a")-1 for every natural number n > |; and of course this equation 
also holds for n = |. In the proof of (47) for this extended case we may 
restrict ourselves, on account of the commutativity of multiplication and 
addition, to replacing n by —n. For every nonnegative integer m we then 
obtain 


ann for m> nan, 
a~"a™ = (a)! a™ = (1 for m= n, 
(a"/a™)-1 = (a"-™)-1 for m<n; 


for we have a™-"a" = a™ for m>n and a”-"a"™ = a" for m<n. 
Since for natural numbers m, n we also have a-"a-™ = (a")-! (a™)-! = 
(art+m)-1 — g-(nt+m) — q\-n)+(-™), we see that (47) has now been proved 
for arbitrary exponents. In order to make the corresponding extension of 
(49), we need the rule 


(a-1)” = (a")-1 (n a natural number), . 


which follows at once from the equation (a-!)" a” = | [see (50)]. For 
natural numbers m, n we now obtain 


(ayn = (ayy = (ayy = (ay = arn = atm, 
(ayn = (any = (army = ame = am, 
(amy = (@"yyryt = (ayy = (ary = an = amen, 
Finally, it is also easy to prove (50) for negative exponents: 


(ab)-* = (aby)? = (arbor)? = (ary (br) = arb, 
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3.4. Order 

With a view to extending the ordering of the integers to the rational 
numbers, let us first examine the properties which a domain of positivity 
P of the field of rational numbers must have, in case such a domain 
exists. The set of integers contained in P is obviously a domain of positivity 
for the ring of integers, so that in particular N C P. But from (44, ,) and 
a® — (—a)? it follows that a domain of positivity of a ring must contain 
all squares 40. Thus for a,beWN it follows from a/b = ab(b~!)* that 
a/b ¢ P. But if P contained a rational number that is not of this form, 
then by (44,) it would also contain natural numbers a, b with (—a)/b € P, 
which contradicts (44,), since a/b + (—a)/b = O¢ P and a/b ec P. Thus P 
consists precisely of the quotients of natural numbers. But these quotients 
do in fact form a domain of positivity, since a rational number 40 which 
is distinct from these quotients has the form (—a)/b (a,b € N), which 
means that (44,) is valid, while (44,5) obviously hold. Thus the field of 
rational numbers can be ordered in exactly one way. Since 


b/b’ — ala’ = b/b’ + (—a)/a’ = (ba' + b'(—a))/a'b’ = (ba’ — b’a)/a'b’, 


we have for integers a, a’, b, b’: aja’ < b/b’ for a’, b’ > 0, if and only if 
ab’ <a'’b. Since c >0O implies c-! = c(c"!)? > 0 for every rational 
number c, this result can easily be extended by (45) to arbitrary rational 
numbers a, a’, b, b’. 

The ordering of the rational numbers is Archimedean; that is, for every 
a, b > 0 there exists a natural number x with na > 6.5” For the proof we 
first restrict ourselves to integers a, b. Then from b+ 1 > 6 anda > l 
it follows by the monotonic law that (6 + l)a > ba > bl = 5, so that 
na > b with n = b+ 1. But then for the rational numbers a/a’, b/b’ > 0 
(a, a’, b, b’ natural numbers) we have a/a’ = ab’/a'b’, b/b’ = a'b/a’b’, so 
that if we choose a natural number xv with nab’ > a’b, it follows by 
multiplication with the positive number (a’b’)-! that n(a/a’) > b/b’, as 
desired. In an arbitrary module, which may not contain the natural 
numbers, we can always define va as being equal to }°7_, a, so that the 
definition of “Archimedean” is applicable to any ordered module. At the 
end of §4.3 we give an example of an ordered module in which the ordering 
is not Archimedean. 

The ordering of a field®* is Archimedean if and only if0 <a <r}, 
for all natural numbers n implies a = 0. For if a > 0 inan Archimedean 


57 This statement is often called the ‘‘axiom of Archimedes” since it occurs as an 
axiom in geometry (cf. 112, §1.2). 

58 If the field does not contain the natural numbers, then in the following inequality 
(and in the proof) the natural number # must be replaced by the nth multiple of the unit 
element 1 of the field. 
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ordering, then there exists a natural number n with na > 1, so that 
a >n-1, The same argument obviously holds for an ordered ring that 
contains n-! for every natural number 7. Conversely, if 0 <a <n" 
implies a = 0, and if a,b >0, so that a/b >0, then the inequality 
na <b cannot hold for every natural number n, since it would imply 
alb <n“, - 

The absolute value | a| of the rational number a is defined as follows: 


(51) |a| = max(a, —a); 


where by max(a, b) we mean the number b if a < b and the number a if 
a>b. Thus |a| =a or = —a and |a| >0, and these properties 
obviously characterize the number | a |. Then we can at once derive 


(52) |ab| = |a||d|. 
Since +a <|a| forall a, we have +(a + 6) <]a|-+|6| and therefore 
(53) la+b| <lal+ |b}. 


Replacing b by —5, wesee, since| —b | = | b|,thatl|a —b| <Ja|+]b], 
and if we replace a — b by a, and a by a + 5, we have 


la|<l/a+b6|+]6|, andtherefore |a|—|b| <Ja+b\. 


Since the right-hand side is not altered by the interchange of a and )b, 
it follows that 


(54) lla}—]b|| <|a+ |. 


As in (53), we may replace a + b by a — b. Of course, the definition (51) 
of absolute value, and with it the consequences (52), (53), (54), are valid 
for any ordered ring. 


3.5. Endomorphisms 


In view of the distributive law, the mapping x — cx for any rational 
number c is an endomorphism of the module (with respect to addition) 
of the rational numbers. As in §2.4, we can show that for two endomor- 
phisms f, g of this module, the equality f(1) = g(1) implies f(x) = g(x) 
for all integers. From (38) it is easy to prove fOCh, X,) = Xhi f(x) by 
complete induction. Thus for a rational a/a’ (a, a’ integers, a’ > 0) and 
two endomorphisms f, g with f(1) = g(1) we have: 


a’ f(ala’) = y fala’) = fla'(ala’y) = f(a) = g(a) = a’g(ala’). 
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But then it follows that f(a/a’) = g(a/a’), so that an endomorphism f 
is completely determined by the value of f(1). Since the image of 1 under 
the endomorphism x -—»+ cx is the number c, the mapping f—/(1) is a 
one-to-one mapping of the ring of endomorphisms onto the field of rational 
numbers. In fact, this mapping is even an isomorphism, as can be proved in 
exactly the same way as the corresponding statement in §2.4. As in the 
corresponding case for the integers, the ring of endomorphisms of the 
additive module of the rational numbers is isomorphic to the field of 
rational numbers. 

Furthermore, as in §2.5, the endomorphisms +40 are monotone 
mappings: x-—»cx is monotone increasing for c >0 and monotone 
decreasing for c < 0. 

It is obvious that the endomorphism x — cx is also a homomorphism 
with respect to multiplication if and only if c(x, y) = (ex)(cy) for all 
x, y, or in other words, if and only if c? = c. But in view of the absence of 
divisors of zero, the equation c? — c = c(c — 1) shows that c? = c only 
for c = 0 or c = |. Thus the field of rational numbers has exactly two 
homomorphisms (with respect to addition and multiplication) into 
itself, namely the zero mapping x — 0 and the identity mapping x —> x. 


4. The Real Numbers 


4.1. Decimal Fractions 

In the present section we denote by g a fixed integer g > 1, which we 
call a base. For any given positive rational number r we now use g to 
determine®® the sequence of integers a, (n = 0,1, 2,...) by recursion in 
the following way: a, is the greatest integer <r; a,,,(n > 0) is the greatest 
integer <(r — 1°", a,g7*) g"*4, With the abbreviation r, = 2, a.g7, 
we then have nit < (r a rn) one < Gani a I, Any g (nth) = lni1 — 'n 
and consequently (with n in place of n + 1) 


(55) In <I <tr, +e (n = 0, 1, 2, ...). 


It follows that 0 <r—r,<g™ and thus 0 <a,,, <g, so that 
0 <a, <g(n = 1,2.,...). In view of (55) we can also describe r, as the 
greatest integral multiple of g-" which is <r. The sequence of the a, 
determines r uniquely; for if (55) holds for r’ as well as for r, then 


—g"™<r—r<g for all natural numbers n. 


5° The principle of recursion at the end of §1.2 shows that the function n — a, is 
uniquely determined by the above requirements; the fact that in the present case n may 
take the value 0 represents only an insignificant change from §1.2. 


130 PART B- ARITHMETIC AND ALGEBRA 
Since g — 1 > 0, we have by the binomial theorem (see IB4, §1.3) 


gr = (1 + (g — 1)” > ng — 1) 


and thus g~" < (g — 1)" n“1 <n“, so that no positive number can be 
<g-" for all natural numbers n. Thus r’ <r or r <r’ would lead to a 
contradiction, so that we must have r’ = r. 

In accordance with the usual practice for the base 10 we write the 
sequence of the a, in the form ap.a,a,a3 ... and call it an infinite decimal.® 
Since this sequence can be regarded as a complete substitute for the number 
r, we write 


(56) Fr = Ay. QQ... . 


But we now encounter the following extremely significant fact: although 
every rational number gives rise in this way to an infinite sequence of 
nonnegative integers <g, it is not true that every such sequence can be 
obtained from some rational number (for an example see §4.8). In order 
to extend the field of rational numbers, it therefore seems appropriate to 
consider all infinite sequences of nonnegative integers a,(n = 0, 1, 2, ...) 
with a, < gforn > 0; these sequences are to be taken as the elements of a 
domain of numbers which in view of (56) contain® the rational numbers 
>0. Then for these new numbers we must define equality, order, and 
addition in such a way that when applied to the rational numbers in 
accordance with (56) they will yield the same results as the corresponding 
concepts already defined for rational numbers. For the definition of 
equality it is natural to set 


Ay. 1AyQg oe by.by babs eee 


if and only if a, = 5, for all n = 0, 1,2, .... As an ordering we take the 
natural lexicographic ordering:®= dy.d,Q2Q3 ... < bo.bybeb, ... if and only 
if there exists a nonnegative integer n with a; = b, for all i <n and 
a, <b,. The definition of addition is necessarily somewhat lengthy, 


6° Of course, from the etymological point of view the word ‘‘decimal” ought to be 
replaced by some other word corresponding to the value of g; for g = 2 the phrase 
dyadic fractions is also used. 

61 For brevity we restrict ourselves here to the numbers >0. For a negative number r 
we define the a,, as the numbers corresponding to —r in the above procedure and then 
we set r = —(Go.a 1020s ...). 

82 So-called because the words in a lexicon are arranged by this principle; the letters 
in alphabetic order correspond here to the nonnegative numbers in the order introduced 
above. Of course, lexicons do not normally contain words with infinitely many letters. 
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so that we shall content ourselves here with defining the sum 
Ay -A,Aolg ... + B-* = by.bybobs ... : 


If a, ~Ag—1 or k=O, we set 
bn = Gn for nk, 
b, = a, + l, 


but if a, = g — 1, thenforh <n <k and (ifh > 0) a, 4 g — 1, we set 


b, = an for n<h and n>k, 
b, = 90 for h<nc<k, 
b, = a, + 1. 


One reason for choosing this definition is that for infinite decimals (56) 
that are equal to rational numbers it is a readily provable rule. 

But now there is a difficulty. In the case a4, = g—1 for n>k, 
a, ~g—1 ork = 0 our definitions (of order and addition) give for all 
n>k: 


k 
Ay.Q,A2d3... + 9" Y ag? + g-* = ay, Ay... Ay_s(a, + 1)000 .... 
7=0 
If the monotonic law is to hold for addition and if subtraction is to be 
possible (for the case when the subtrahend is smaller than the minuend), 
we have the following inequality (cf. the calculation given above) for the 
difference d = St, ag7t + g7* — dy. yQeMg... : 


d<g*<(g—1)7n for all n >k. 


But d > 0,sothat there exists a positive rational number <d < (g — 1)-"n7™1 
for all n > k, in contradiction to the fact that the ordering of the rational 
numbers is Archimedean. The solution of the difficulty lies, of course, in 
excluding the sequences with a, = g — 1 for all n >k. In fact, such 
Sequences do not occur in the decimal expansions of rational numbers. 
For if a, = g — 1 for all n > k, it follows that 


n n—k—1 
m—hm=(g—l) ¥ gt=(g-—lge" d¥ gv 
t=k+1 h=0 


=o geo P lie gg 
So that r, + g-” = r, + g-*; but then from (55) we would have 
0<(,+¢"%)—rig"<(g-—)) nn forall n>k 


in contradiction to the fact that the ordering of the rational numbers is 
Archimedean. 
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In this way the real numbers can be introduced as infinite decimals, 
and it can be shown that they do in fact form an ordered module (with 
Archimedean ordering) that includes the module of the rational numbers. 
Let us now consider the following theorem, which is of basic importance 
in analysis: every non-empty set of real numbers which is bounded below® 
has an infimum, or greatest lower bound. This theorem is now very easy to 
prove. For by the addition of a sufficiently large number (namely —s, 
if s is a negative lower bound) the set M becomes a set of nonnegative 
numbers, and then r, is defined, in accordance with (55), as that integral 
multiple of g-” with r, < x for all x « M such that there exists a number 
xeM with x<r,+g. As in (55), there then exist integers 
a, (n = 0,1, 2,...) with r, = YL, ag7t and 0 <a, <g(n = 1,2,...). 
From r, < x <r, + g-* it then follows as before that a, = g — | for 
n >k is impossible, so that ap.a,@.a, ... is actually a real number, which 
is easily recognized as the greatest lower bound of M. 


4.2. A Survey of Various Possible Procedures 


From the point of view of practical calculation the introduction of the real 
numbers by means of decimal expansions as described above in §4.1 is very 
natural and has the advantage that the concepts involved in it are relatively 
simple. A further advantage is that if we use decimal expansions, we can 
introduce the real numbers immediately after adjoining zero to the natural 
numbers without first introducing the integers and then the rational 
numbers. It is convenient to introduce only the nonnegative real numbers 
at first and then to apply to them the method of extension described in 
§2.3. 

But these advantages are obtained at high cost; addition can only be 
defined in a very lengthy way,® and the rules for calculation are not very 
convenient to prove. These disadvantages obviously arise from the special 
form of the r, . In order to avoid them, it will be convenient to replace 
these r,, by more general entities, for which we shall naturally wish to 
preserve certain properties of the r,. To do this we may start from either 
of two facts: 


1) a.€,@odg ... is the least upper bound of the set of rn. 


2) dy.a,Q,a,... is the limit of the sequence of r,; for we have 


83 A non-empty set that is bounded above can be reduced to the present case by the 
mapping x — —.x and is thus shown to have a supremum, or least upper bound. 

®4 Since the intermediate stages will be important to us later, we have not followed 
this plan here, 

85 To say nothing of multiplication, which we shall discuss in a separate section. 
See, for example, F. A. Behrend, A contribution to the theory of magnitudes and the 
foundations of analysis, Math. Zeitschr. 63, 345-362 (1956). 
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0 < Ay.4y A203 ... —fn <2" < (g — 1)-1n“, and for every real number 
e« > 0 we can find a natural number n, with n,(g — 1) > e71, where- 
upon | @.4,4,a,... —r, | <¢foralln >n. 


The first of these two facts suggests that, in a completely general way, 
we may take as our starting point all non-empty sets of rational numbers 
bounded from above.®* This procedure, discussed in §4.3, is essentially 
the method of Dedekind for defining the real numbers by Dedekind cuts. 
It has the advantage that it defines the real numbers and their order without 
making any use of addition, so that it can be applied to more general 
systems in which only an order is defined.®” 

On the other hand, the second of the above listed facts is used as our 
starting point in §4.4 to introduce the real numbers by means of the 
fundamental sequences of Cantor; that is, sequences of rational numbers 
which satisfy the Cauchy criterion for convergence. Since the definition 
of this quite general class of sequences does not require the ordering of the 
rational numbers but only of their absolute values, it too can be extended 
to a more general class of modules than the ordered ones.® In this case, 
however, the addition of rational numbers is already employed in the very 
definition of real numbers. 

The introduction of the real numbers by means of nested intervals is to 
a certain extent a mixed procedure. Here we employ pairs of sequences 
of rational numbers (4,)na1.2,.... (Gn)n_1,2,... With @, < nay K Aniy Kal, 
for all m, n and lim,.,.(a’ — a,) = 0.8 Since both order and addition 
are required here, the usefulness of this procedure for more general 
systems is considerably reduced, so that we shall not deal with it in detail 
in the following sections but shall content ourselves with the following 
remarks. The nests of intervals « and 8 arising from the pairs of sequences 
with terms a, , a, and b, , bj, are said to be equal if and only if for every 
index pair m,n there exists a rational number x with a, <x <a, 
bn <x <b. The number « is set equal to the rational number r if 
Q, <r <a, for all n. By the sum « + B we mean the nest of intervals 
defined by the sequences (4, + by)nat2,... >(€n + BOn)nat.2,...> Where it 
remains to be proved that when equality is defined as above, this sum 
depends only on a, B. If we add to a the nest of intervals defined by the 
sequences with terms —a,, , —a, the sum is equal to the rational number 
0, which is easily seen to be a neutral element for the addition defined 


*° Of course, we could just as well consider the non-empty sets bounded from below. 

*7 To a certain extent (see §4.3) we only require a partial order. 

8° See, for example, van der Waerden [2], §§74, 75. In topology the same procedure 
is followed with even greater generality for metric spaces. 

$® Here a, and ai, are regarded as the endpoints of an interval, which may reduce to 
a single point. Such intervals are said to be nested within one another. 
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in this way. Since associativity and commutativity are obvious, the real 
numbers defined as nests of intervals form a module, for which the set of 
a + 0 with a, > 0 is seen to be a domain of positivity. The least upper 
bound of a non-empty set of real numbers bounded from above is most 
easily constructed by the principle of nesting of intervals: let us first choose 
an interval with rational endpoints containing an upper bound and at 
least one number of the set; we then carry out a sequence of bisections, 
where after each bisection we choose the subinterval as far as possible to 
the right still containing numbers of the set. The resulting sequence of 
intervals is then seen to be a nested set which is the desired least upper 
bound. 

We have now indicated various methods for introducing the “real 
numbers”’; for that matter, the decimal procedure already provides us 
with infinitely many methods, since we have free choice of a base. So it 
is natural to ask: to what extent do all these methods lead to the same 
result? Certainly it is true that the entities we have called real numbers are 
quite different from case to case; for example, an infinite decimal 0.2... 
cannot occur if 2 is the base. But even if certain objects can occur in 
several different methods, it is by no means necessary for them to have the 
same meaning in the different methods; for base 3, for example, the 
infinite “decimal” 1.111... = 3/2, whereas for base 10 it is =10/9. 
But in §4.5 we will show that all the domains obtained in this way can be 
mapped onto one another by isomorphisms that preserve the order (with 
respect to addition), so that in this sense there is no essential difference 
among the various systems. 


4.3. Dedekind Cuts 

Our purpose is to extend the module of rational numbers in such a way 
that every non-empty set of rational numbers bounded from above has a 
least upper bound. This problem is very similar to the requirement that led 
us to the integers, namely that every equation (22) should have a solution. 
In §2.2 we took pairs of numbers as our starting point. Analogously, we 
now consider all non-empty sets of rational numbers that are bounded 
from above and to each such set we assign a new symbol fin M.’° Since 
the least upper bound of a set M is determined by the set of upper bounds 


70 For the time being “‘fin” has no meaning whatever; only after we have introduced 
an ordering will fin M actually turn out to be the least upper bound (finis superior, or 
supremum) of the set M. The construction of such symbols as fin M is permissible only 
if we take the constructive or operational attitude toward the foundations of mathe- 
matics (see IA, §§1.4 and 10.6), in which the sets themselves are symbols. From other 
points of view we must proceed somewhat differently; for example, in order to have an 
entity which is distinct from M but naturally associated with it, we might consider the 
pairs (0, M), for which we could then introduce the abbreviation fin M. 
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of M, we will define the equality fin M = fin M’ as identity of the sets of 
upper bounds of M and M’. For abbreviation we let S(M) denote the set 
of upper bounds of M or, in other words, the set of rational numbers x 
with x > y for all ye M. By a Dedekind cut we mean the pair consisting 
of the set S(M) and the set of rational numbers not included in S(M). 
It is easy to show that a pair of sets M’, M” of rational numbers is a 
Dedekind cut if and only if it has the following three properties: (i) every 
rational number x belongs to exactly one of the two sets M’, M”; (ii) if 
x’ € M’,x” — M”’, then x” < x’; (iii) the least upper bound of M”, provided 
it exists (in the set of rational numbers), belongs to M’. These three 
properties are often taken as the definition of a Dedekind cut, although 
usually the third one is omitted, since it is quite unimportant. In fact, 
such a definition may be taken as the starting point for introducing 
the real numbers, but to us it seems more natural to start from the sets 
M.™ An order for Dedekind cuts is immediately available if we note that 
“pushing up” the least upper bound has the effect of decreasing the set 
of upper bounds: thus by fin M < fin M’ we shall mean simply 
S(M’) C S(M). We see at once that this is actually a relation between fin M@ 
and fin M’ and not only between the sets M@ and M’; for if fin M@ = fin M, 
and fin M’ = fin M,, then fin M < fin M’ obviously means the same as 
fin M, < fin M;. For this relation < the properties (12), (13) are at once 
clear. Thus in order to prove that we have actually defined an ordering,”® 
it only remains to prove (15). To verify (15) for the new symbols (or 
equivalently, for Dedekind cuts), we now assume that for two sets M, M’ 
neither fin M@’ = fin M nor fin M’ < fin M; that is, it is not true that 
S(M) € S(M’). Then there exists an upper bound s of M which is not an 
upper bound of ’: that is, there exists x’ « M’ with x’ > s, and for all 
xe M we have x <5. Thus x’ > x for all x € M. For every upper bound 
s’ of M’ it is a fortiori true that s’ > x for all xe M, and therefore 
s’€S(M). Consequently S(M’)C S(M), so that fin @ < fin M’, as 
desired, since fin M =: fin M’ was excluded. 

The symbol fin M (which we may, if we wish, regard as the set of upper 
bounds of M, or alternatively as the corresponding Dedekind cut) will 
now be called a real number. Relaxing the restriction that M must be 
bounded from above leads to exactly one new “real number.” For if the 
Sets M’, M” are not bounded from above, then S(M’), S(M”) are empty 


71 Moreover, the definition of Dedekind cuts as pairs of sets with the above properties 
requires a total ordering, whereas our procedure can also be used in the case of a partial 
ordering; see the next footnote. 

7 That is: < is an order (in the notation of IA, §8.3). Since we have not yet defined 
addition, (14) requires no attention. Moreover, as long as addition is not yet introduced, 
the property (15) of the rational numbers is required only here, namely in the proof 
of (15) for the newly introduced symbols. 


136 PART B ARITHMETIC AND ALGEBRA 


and thus equal to each other, so that fin M’ = fin M”. If for this improper 
real number we introduce the usual abbreviation oo, then fin M < co 
for every other real number fin M, since the empty set is a proper subset 
of any non-empty set. If we also admit the empty set @, we obtain exactly 
one new improper real number fin 9, which is usually written — oo. Since 
S(@) contains all the rational numbers,’”* we have —oo < fin M for every 
non-empty set M. Thus the improper real numbers 00, — oo have some 
importance in the ordering of the real numbers, but they play no role in 
addition, as defined below. In what follows we shall disregard them 
altogether. 

It remains to answer the following question: when is a rational number 
a equal to a real number fin M? Letting S(a) denote the set of x > a we 
have the obvious requirement S(M) = S(a), which we shall take as a 
definition of fin M = a as well as of a = fin M. As in §2.2, we show by 
simple verification of the four possible cases that comparativity is not 
destroyed by this definition. It remains to show that the meaning of 
a <a’ and of a <a’ is unchanged if a, a’ are replaced by real numbers 
fin M, fin M’ equal to them: but fin @ < fin M’ means S(M’) C S(M), 
and therefore S(a’) € S(a), or a < a’, as desired. Now fin ™ is actually 
the least upper bound of the set M. For xe M we have in every case 
S(M) € S(x), so that x < fin M. On the other hand, if fin M’ > x for all 
xe M, then S(M’)C S(x) for all xe M. Thus ye S(M’) implies y > x 
for all x € M and therefore ye S(M). Consequently, S(M’) C S(M) and 
thus fin M < fin M’, so that fin M is actually the least upper bound. 

But so far we have shown only that every non-empty set of rational 
numbers bounded from above has a least upper bound; now we must 
demonstrate the same property for any such set of real numbers. In order 
to describe a set M of real numbers we require a set IN of non-empty sets 
of rational numbers that are bounded from above: the real numbers in M 
are then the fin M with M € Mt. We now let fin M, be an upper bound of 
M, so that fin M < fin M,, which means that S(M,)CS(M) for all 
M eM. Then we form the (non-empty) set?* M* = Uae M or, in other 
words, the set of elements x with x € M for at least one set Me M, and 


73 From x € @ (always false!) it follows that x < y for every number y. 

74 Here we have a difficulty related to the foundations of mathematics. If we adopt 
a theory of sets in which the union of any set of sets can always be formed, we must 
accept the disadvantage that such a theory, at least if it is to satisfy the other demands 
of mathematics, has not yet been shown to be free of contradictions (see IA, §7.1). On 
the other hand, we may say that a set of sets is to be formed only in a second language 
layer, and then, under certain circumstances, we can form a union of sets only in this 
second layer; in order to preserve our theorem about the least upper bound, we must 
form a language layer corresponding to every natural number and then distinguish real 
numbers of the Ist, 2nd, 3rd, ... layer (see Lorenzen [1]). A corresponding difficulty 
arises in the other methods of introducing real numbers. 
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prove that fin /* is the least upper bound of M. In order to show that 
fin M* is a real number at all, we must first prove that M* is bounded 
from above: but in view of S(M,) € S(M), an upper bound for M, is also 
an upper bound for every set M € Mt and therefore also an upper bound 
for M*. We now investigate the upper bounds fin M’ of M (where M’ is a 
non-empty set of rational numbers bounded from above, so that fin M’ 
is a real number). The inequality fin ’ > fin M for all M & MN means that 
S(M’) © S(M) for all M € Mt, so that every upper bound of M’ is an upper 
bound of every set 7 « M. But this simply means that every upper bound 
of M’ is an upper bound of M*; in other words, S(M’) C S(M*) and 
therefore fin M’ > fin M*. Consequently, fin M* is the least upper 
bound of M, as desired. 

We come now to the introduction of addition. For the sets M of 
rational numbers it is obvious what we should do: M + M’ is to be defined 
as the set of all x + x’ with xe M, x’ & M’; this set is non-empty and 
bounded from above if the sets M, M’ have those properties, and addition 
defined in this way is clearly associative and commutative. But now again 
we must define addition in the set of real numbers in such a way that 
M -+ fin M is a homomorphism. To this end we must show (as in §2.2) 
that = is consistent with addition, where now M = M’ simply means 
S(M) = S(M’). So by (31) we must show that S(M’) = S(M”) implies 
S(M + M’) = S(M + M"). Or instead, we may show that 


(57) S(M + M’)CS(M + M”"), if S(M’)CS(M’"); 


since the desired result immediately follows from the fact that C and D 
together mean =. In order to prove (57) we let z be an upper bound of 
M+ M’,so that z >x+’ for xe M, x'e M’. Then z— x >’ for 
all x’ € M’, so that z — xe S(M’), and thus, in view of our hypothesis 
that S(M’) C S(M"), we have z — x € S(M"), so that z — x > x” for all 
x” eM”. Consequently, z >x-+ x” for xe M, x”eM" or, in other 
words, ze S(M + M’”). 
The addition of real numbers is now defined by 


(58) fin M + fin M’ = fin(M + M’). 


When applied to rational numbers, this definition of addition agrees with 
the former one, since”> a = fin{a}, a’ = fin{a’}, a + a = fin{a + a’} by 
(58), and {a} + {a} = {a + a’}. The number 0 is also the neutral element 
for addition of real numbers, since fin M+ 0 = fin M + fin{0} = 
fin(M + {0}) = fin M. Since addition of the sets M is associative and 


75 {qa} is the set consisting of the single element a. 
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commutative, as was pointed out before, addition of the real numbers 
by the definition (58) has the same properties. Thus, in order to prove that 
the set of real numbers is a module, it only remains to show that every 
real number fin / has an inverse. For this purpose we consider the set 
M’ of x’ with —x’ « S(M) and show that S(0) © S(M + M’) and also 
S(M + M’)C S(O), so that S(0)=S(M@-+ M’), and therefore 
fin M + fin M’ = 0, as desired. For the proof of the first inclusion we 
assume zeéS(0), so that z >0. From x < —x’ for all xe M, x’ € M’ 
it follows that x + x’ < z, so that ze S(M + M’). Thus we have shown 
that S(0) C S(M + M’). For the proof of the second inclusion we assume 
zeES(M + M’), so that z > x-+ x’ for all x e M, x’ € M’, from which it 
follows that z — x’ > x, so that z — x’ € S(M). Since —x’ is an arbitrary 
number from S(M), we can prove. by complete induction that 
(n+1)z—x' =2z+ (nz — x’) implies nz — x’ € S(M) for all natural 
numbers n, and then for x € M we obtain the result thatn(—z) < —(x + x’) 
for all n. Since the order is Archimedean, it is therefore impossible that 
—z>0. Thus z>0, and we have completed the proof that 
S(M + M’)CS(0). 

We have already shown on page 135 that the relation < is an ordering 
of the real numbers. The monotonic law of addition now follows at once 
from (57), (58) and the definition of <. Consequently, by §2.5 the module 
of real numbers is an ordered module. 

An ordered module in which every non-empty set bounded from above 
has a least upper bound is called a complete ordered module. In such a 
module every non-empty set bounded from below has a greatest lower 
bound;’¢ as can be seen at once, since the inequality x < y,ie,0 << y— x, 
implies —y < —x in view of the fact that y — x = (—x) — (—y), 
—y < —x, the mapping x —-- —x takes a non-empty set M bounded from 
below into a non-empty set bounded from above and takes its least upper 
bound into the greatest lower bound of M. 

Our results can now be expressed as follows: there exists a complete 
module which includes the module of the rational numbers, where by 
inclusion we mean not only that all the rational numbers occur in the 
new module but also that in it the addition and order for rational numbers 
are defined in exactly the same way as in the module of rational numbers.” 
The above proofs show that the module of rational numbers, which formed 
our starting point, could be replaced by any module with an Archimedean 


76 Thus the concept of completeness here is closely related to completeness in lattices 
(see IBY, §1), with the difference that in a complete lattice every set has a greatest lower 
and a least upper bound; strictly speaking, we ought to say that the module of real 
numbers is ‘conditionally complete.” 

77 Compare the concept of a subgroup in IB2, §3.2. 
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ordering.’”® But it is essential that the ordering be Archimedean, as is 
shown by the theorem: 


If an ordered module is complete, its order is Archimedean. 


For in a module whose ordering is non-Archimedean there exist elements 
a,b >0 such that na < b for all natural numbers n. Thus the set of 
numbers of the form na is bounded from above, and for every upper 
bound s of this set there exists a smaller one, namely s — a, since 
(n+ la=na+ace<s and therefore na<s-—a for all natural 
numbers n. Thus the set of numbers za has no least upper bound, so that 
the module is not complete. 

Finally, let us give a simple example of an ordered module in which the 
ordering is not Archimedean. We form the pairs (a, a’) of integers (or of 
rational or real numbers) and define (a, a’) + (6, b’) = (a+ b,a’ + b’). 
Then it is easy to see that the pairs (a, a’) for which either a > 0 or else 
a= 0 and a’ >0 form a domain of positivity. Thus (a, a’) < (b, 5’) 
if and only if a<b or else a=b and a’ <b’.” Since n(0, 1) = 
>, 0, 1) = (0, n), we have n(0, 1) < (1, 0) for every natural number n. 
Thus the element (0, 1) in this ordering is said to be infinitesimal® in 
comparison with (1, 0). In IB4, §2.5 we will give an example of an ordered 
field in which the ordering is not Archimedean. 


4.4. Fundamental Sequences 


We consider sequences (a,)n-1,2,.,., Which for brevity we shall denote*! 
simply by a, of rational numbers a, satisfying the Cauchy criterion for 
convergence: 

For each rational number e > 0 there exists a natural number n, such 
that | a, —a, | <eforalln,m>n. 

These sequences are called fundamental sequences or Cauchy sequences. 
If addition and multiplication are defined by 


(59) (4+ B)n = Gn + bn, — (Ab)n = nbn 5 


78 To be sure, we sometimes mention products of the form z (where a is a natural 
number and z is an element of the module); but these products could always be con- 
sidered as sums (of n summands), so that there is no need of a multiplication for the 
elements of the module. 

” The ordering here is lexicographic (see page 130, footnote 62). 

8° Of course, the concept defined here has nothing whatever to do with the incorrect 
use of this expression in analysis. 

81 In (@p)nai.2.... the 2 is a bound variable (cf. IA, §8.4, §2.6), as is indicated by the 
sign of equality after it; another possible notation is 2 — a, , but then we must indicate 
in some way that in the symbol a, the n is to be replaced by the natural numbers and by 
nothing else. The abbreviation a is to be understood as follows: a, is the value of the 
function a for the argument #, or in other words a, is the nth term of the sequence. 
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the set of fundamental sequences becomes a commutative ring ® with 
unit element. Of course, we must first of all show that the sequences 
a + b, ab defined in this way are again fundamental sequences. For a + b 
this result follows immediately from the inequality [cf. (53)] 


(a, = bn) AD (Qn =f bn)| = (Qn -% Am) ae (On _ bn) 
< | a, — Ay | + | On — Bn |. 


In order to prove the same result for ab, we require the following theorem: 

For every fundamental sequence there exists a rational number s with 
| a, | <-s for all n. To prove this theorem we first determine n, in such a 
way that |a, —a,| <1 for all n,m > xn, and then set m = n,. For 
n > n, it follows from (53) that | a, | = | dn, + (@n — 4n,)| <|@n,| + 1. 
So it is sufficient to take s > | a;| (i = 1,...,m — 1) and >/a,,| + 1, 
as is always possible. 

Thus we may consider s (>0) to have been so chosen that | a, |, |b, | <s 
for all n, and then from 


| QnPn — @mPm | = | @n(On — bm) + (@n — Gm) Om | 
<5 |b, — by | + 5 | Qn — An | 


we readily obtain the desired result that ab is a fundamental sequence. 

The ring properties and the commutativity of multiplication follow 
readily from (59). It is obvious that the sequences (0),21,2,... and (1)ne1,o,... 
are the zero element and the unit element respectively, so that we shall 
denote them simply by 0 and 1. 

Now for every fundamental sequence we construct a new symbol lim a. 
Since this symbol is to mean the limit of the sequence, we shall set 
lim a = lim b if and only if c = a — b is a zero sequence; that is, if for 
every rational number e > 0 there exists a natural number n, with 
| Cy | <€ for all n > n,. In other words, if we let Jt denote the set of zero 
sequences, we are introducing into ® a relation = by setting a = bif and 
only if a— be N; in the mapping a — lim a two fundamental sequences 
have the same image if and only if the relation = holds between them. 
In order that we may set 


(60) lim a + lim b = lim(a + 5), lim a: limb = lim ab, 


and thus make the mapping a — lim a into a homomorphism of the ring 
® onto the ring with elements lim a, it merely remains to prove that = is 
an equivalence relation which is consistent with addition and multiplication. 
For this purpose we require only the following three properties*®® of 9: 


82 The proof of these properties can be omitted here, since it is exactly the same as 
for the corresponding theorems in analysis; in (61,) we require the fact, proved just 
above, that every fundamental sequence is bounded. 
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(61,) OEM. 
(61,) a— ben, if abeN. 
(613) abe N, if aceN, DER. 


For a = a follows from (61,), and from a=c, b ~c it follows by 
(61,) that a— b = (a—c)—(b—c)EN, so that a = b, which means 
that = is an equivalence relation. Since a = b (or in other words, 
a—beM) implies not only (a+c)—(6+c)eEN but by (613) also 
ac — bc = (a— b)cEN, the equivalence = is in fact consistent with 
addition and multiplication.®* 

Under definition (60) the set of symbols lim a becomes a commutative 
ring,®4 since the homomorphism a— lim a naturally preserves the ring 
properties of 8. We now wish to make this ring into an extension of the 
field of rational numbers. Although the problem here is of exactly the same 
kind as those already solved in §§2.2 and 4.3, we will now solve it in a 
different way, which can be extended more easily to other cases. We first 
show®> that 


(62) u— lim(u)na1,2,... 


is an isomorphism, or in other words, a one-to-one homomorphism of 
the field of rational numbers: for it follows from (u)ne1,0,... = ()nea1,2.... 
that u = v, because by the definition of = we have | u — v| < 1/n for 
every natural number 7, so that | u — v| > 0 is impossible. The fact that 
(62) is a homomorphism then follows at once from (59) and (60). Conse- 
quently, in the following argument we no longer require the special 
properties of lima: we simply set lima = u, u = lima if and only if 
lim a = lim(u),=1,2,... - Since (62) is a one-to-one mapping, the compara- 
tivity of equality is thereby preserved, as is easily shown by separate 
consideration of four cases, as in §2.2. Finally, the fact that (62) is a 
homomorphism shows that addition and multiplication as defined in 
(60) are identical, when applied to rational numbers, with the earlier 
addition and multiplication. 

The commutative ring with unit element that has thus been formed as an 
extension of the field of rational numbers is in fact a field, known as the 
field of real numbers. In §4.6 this assertion will be proved very simply by 


88 This proof is obviously valid for any commutative ring R containing a subset MN 
with the properties (61) (see the concept of an ideal in IBS, §3). 

*4 Namely, the ring of residue classes of R mod R (see IBS, §3.6). 

85 The u, v here are rational numbers and not sequences, The sequence ()n-1,2,... has 
all its elements = u; it is obviously a fundamental sequence. 
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a general argument, but we wish to prove it here also,8* to which end it 
only remains to show that every real number lim a ~ 0 has an inverse. 
But since a is not a zero sequence, there exists an s > 0 such that for 
every natural number » we can find a natural number m >n with 
| a, | = Ss. On the other hand, since a is a fundamental sequence, there 
exists a natural number n, with | a, — a,,| < s/2 for n,m > no. Since 
| Qn | = | @m + (@n — Gm)| = | @m| — | Gn — Gm | [see (54)], we thus have 
| a, | = s/2, so that a, ~ Oforn > ny. The sequence a’ defined by a, = | 
for n<n,a,=—4,' for n >n, is easily seen to be a fundamental 
sequence, in view of 


la—at|=|a,a,|“|a,,—a,| <4s*|a,—a,| for mn>n, 
and since (aa’ — 1), = 0 for n > ny, we have lima: lima’ = I. 

A slight extension of the argument shows that for | a, | > s/2 we can 
also make the following statement: if lim a 4 0, then either there exists 
an ny such that a, > 0 for all n > ny, or else there exists an 7”) such that 
a, <9 for all n >n,. For if there exists an s > 0 such that for every 
natural number n we can find a natural number m > n with a,, > 5, 
then we have a, = dy, + (ad, — Gm) > 5/2 > 0 for n > ny; and if not, 
then for arbitrary s > 0 there exists a natural number n, with a, < s/2 
for all n >n,. Since for suitable s,ng we have already proved that 
| a, | > s/2 for all n > ng, we now have a, < —s/2 for all n > max (no, 1). 
But the two cases are inconsistent with each other. Thus the argument also 
shows that in the first case we can choose c > 0 such that a, > c for all 
n> No, and in the second case we can arrange that a, < —c (<0) forn> ny. 
So we see that the addition of a zero sequence to a produces no change in 
these two cases: the decision as to which of the two cases occurs depends 
only on lim a. The set of elements lim a for which a satisfies the require- 
ment of the first case is easily seen to form a domain of positivity. In the 
resulting ordering of the field of real numbers lim a < lim 5 now means 
that lim a+ limb, and that there exists a natural number n, with 
an <b, for all n >n,. To be consistent with the above formulation, 
we really ought to have written a, <b, , but the argument shows at once 
that we can also write a, < b,. It is to be noted that a, <b, for all 
n > n, does not imply lim a < limb but merely that lima < lim), a 
result which follows equally well from a, < b, for all n > ny. 

We now prove that in the field of real numbers every non-empty set /@ 
bounded from below has a greatest lower bound, and consequently that 
every non-empty set bounded from above has a least upper bound. For 


86 Especially because the proof given here requires an ordering for the absolute 
values only, and thus remains valid for more general systems; cf. van der Waerden [2], 
§§74, 75. 
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every real number lima there certainly exists a rational number not 
greater than lim a and also a rational number not smaller than lim a, for 
it was proved above that for a suitably chosen rational number s we have 
|a@,|<s for all n, so that —s <a, <5 for all nm and consequently 
—s <lima <-s. Thus the set M has a rational lower bound. After 
choice of any integer g > 1 we now denote by r, , as in §4.1, the greatest 
integral multiple of g-" (n = 0, 1, 2, ...) that is still a lower bound of M. 
In view of the fact that the ordering of the rational numbers is Archime- 
dean, the existence of r,, implies that every number less than a lower bound 
of M is again a lower bound and every lower bound lies below some 
rational number, which may simply be any number that is not smaller 
than some number in M. For m > nthe number g~” is an integral multiple 
of g-™ because g-" = g™-"g-™, and thus we have 


(63) In Sm <ln +e for m>A. 


It follows that |r, — rn | g7" for m,n > ny, and since §4.] shows that 
g~"° can be made smaller than any given positive number, the sequence 
r = (rn)n=1,2,... is a fundamental sequence. Furthermore, it follows from 
(63) that 


(64) tm <limr<rn+g" forall av. 


Now if lima is a number such that lim a < lim r, or in other words if 
d = limr — lima > 0, let us determine a natural number vn with g” > d-1 
(as is possible, since there exists a rational number >d-1) and therefore 
with lim a — limr < —g~”. By (64) we then have lima <r, , so that 
lim a cannot be a member of M and consequently lim r is a lower bound 
of M. On the other hand, if lima is such that limr < lima, then in the 
same way g-" < lima — limr. But r, + g-” is certainly not a lower 
bound of M; i.e., there exists a real number limb with limb e M and 
limb <r, +g". Thus we have limb’ <lima-++ r, —limr and there- 
fore, by (64), lim b < lim a. Consequently, no real number > lim r is a 
lower bound of M, so that limr is the greatest lower bound of M, as 
desired. 

As in §4.1, we see that a, = (rp, — rp_1) 2" (n = 1, 2, ...) isa nonnegative 
integer <g, that with ry = ay we have r, = >» a,g~*, which shows how 
our present development is connected with decimal expansions. In general, 
it is easy to see that for every fundamental sequence a the real number 
lim a is in fact the limit of the sequence, so that in our case we may write 
lim r = lim), fn = YL _5 Gang” in the usual notation. 

It is also easy to prove directly that the Cauchy criterion for convergence 
is valid for a sequence of real numbers lim a” (n = 1, 2, ...). For let us 
determine, corresponding to each value of n, a natural number 7’ such that 
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| a” — lima™ | <1/n. Then, from the hypothesis of the Cauchy 
criterion that for « >O there exists a natural number 7, such that 
[lima™ — lima'™ | <e for n,m >n,, we see that the sequence 
b of b, = a’ is fundamental and that lim 6 is the limit of the sequence 
of the numbers lim a). Since this proof makes use of the ordering only 
for absolute values, it is more general than the usual theorem for the 
Cauchy criterion, which is based on the existence of a greatest lower and 
a least upper bound. 


4.5. Isomorphisms 


In §4.1 and also in §4.3 and §4.4 we have constructed complete ordered 
modules containing the module for the rational numbers. We now wish 
to show that any two such modules can be mapped onto each other by an 
order-preserving isomorphism leaving all the rational numbers fixed, 
where by an order-preserving mapping f (which may, in particular, be 
an isomorphism) we mean a mapping that is monotone increasing; that is, 
if x < y, then f(x) < f(y) for all x, y. To do this we first show that the 
completeness of a module is equivalent in the following sense to a certain 
maximal property. 


A module with Archimedean ordering that contains the rational numbers 
is complete if and only if it is not contained in a larger module with Archime- 
dean ordering. 


For if such a module M is not complete, then it can be extended by the 
procedure of §4.3 to a module with Archimedean ordering. Consequently, 
if Mt is not contained in a larger module with Archimedean ordering, then 
WM is complete. On the other hand, if Mt is complete, and if M’ is a module 
with Archimedean ordering that contains M, then Wt and M?’ coincide, as 
can be proved in the following way: 

Let a’ be an element of WM’, let M be the set of rational numbers <a’ 
and let a be the least upper bound of M, which is certainly in M. 

We now require the following lemma: if x, y with x < y are elements 
of a module with Archimedean ordering that contains: the rational 
numbers, then the module also contains a rational number r = m/n with 
x <r <_y.Forthe proof we simply determine the natural number n and then 
the natural number m such that I/n < y — x and (m — 1)/n < x < mln: 
for then we have 


mi[n = (m—1)/n+ Ifn<x+yv-—-x=y. 


Thus for every element x <a’ in M we can find a rational number r 
with x <r <a’, so that x is not an upper bound of M and therefore 
x sa. On the other hand, if x is an element of IN with x > a’, there 
exists a rational number r with a’ <r <x, so that r is a smaller upper 
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bound of M than x, and thus again x ~ a. But since x 4 a’ implies 
either x <a’ or a’ <x, we have shown that a’ ~a is impossible; in 
other words, a’ = a, and thus every element of I’ also belongs to M, 
which completes the proof of the stated maximal property. 

We now let IR denote any complete ordered module that contains the 
rational numbers, and we map MN in the following way onto the complete 
module, now denoted by I, , that was constructed in §4.3: 


(65) x — fin M,, 


where M, is the set of rational numbers <x. If x < y, then, as already 
shown, there exists a rational number r with x <r<vy and also a 
rational number s with r<s<y, so that r is in S(M,) but not 
in S(M,). On the other hand, since it is obvious that M,C M, 
and thus S(M,)C S(M,), we see that S(M,)C S(M,). Consequently, 
x<y implies fin M,<fin M,, so that the mapping (65) is 
monotone increasing, and therefore one-to-one. That (65) is an 
isomorphism (with respect to addition) can be seen as follows: 
Since r<x,s<y implies the inequality r+s<x-+y, we have 
M, + M, © M,,,. Now in order to show M,,, © M, + M,, we choose 
a rational number ¢t < x + yand then arational t’ withO <t’<x+y-—t 
so that t<x-+y-—vrt’. But there exist rational numbers r, s with 
x—t/2<r<x,y—t'/2<s</y, which implies that x+y—U'< 
r+ s. Taking these inequalities together we see that ¢ <r+s5,r<-x, 
s <y. Thus for r’ =r—(r+s—1t)/2, s =s—(r+s —t)/2 we have 
r'eM,,s'€M,,t=r' +’ and therefore te M, + M,. Consequently, 
M442 M, + M, and thus M,,, = M, + M,. In view of (58) we have 
therefore proved that fin M, + fin M, = fin M,,,, which means that 
(65) is actually an isomorphism with respect to addition. Thus the entire 
set of numbers fin M, (with x € MN) also forms a complete ordered module 
YN’. But this module contains all the rational numbers, since for a rational 
number r we have S(M,) = S(r) and therefore fin M, =r. By the 
maximal property proved at the beginning of this section, it follows that 
My = Mt, so that the mapping (65) of M onto the module WM, is an 
order-preserving isomorphism which leaves the rational numbers fixed. 
Now if M,,M, are two complete ordered modules that contain the 
rational numbers, we can first map Mt, by (65) onto M, and then map 
Wy onto M, by the inverse of the mapping (65) from M, into Wt, . Thus 
we have the following important result: 


Two complete ordered modules that contain the rational numbers can be 
mapped onto each other by a mapping which is isomorphic and order- 
preserving and leaves the rational numbers fixed. 


Moreover, there can be only one such mapping; for (65) is the only 
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order-preserving mapping of Dt onto M, that leaves the rational numbers 
fixed. To show this we need only note that by §4.3 the set /, has the least 
upper bound fin M, in Mt, , and, on the other hand, has the least upper 
bound x in Mt, since by the lemma at the beginning of this section there 
exists a rational number r € M, such that y <r for every element ye M 
with y < x. But an order-preserving mapping of Mt onto Mt, must map 
the least upper bound of M, in M onto the least upper bound of M, in 
9, , and must therefore map the element x of I onto the real number 
fin M, . 

This result shows that it makes no difference which of the above modules 
(all of them are ordered and complete and contain the rational numbers) 
is called the module of the real numbers. For any two of them there is a 
uniquely determined mapping of one onto the other which leaves the 
rational numbers fixed and preserves order and addition. 


4.6. Multiplication 

In §4.4 we have already defined multiplication for the real numbers, but 
we now wish to define multiplication independently of the particular 
construction chosen there for the module of real numbers. Such a definition 
is made possible by the existence of monotone endomorphisms. If c 4 0, 
the mapping x — cx is a monotone endomorphism (as in §2.5). On the 
other hand, any monotone endomorphism /is already uniquely determined 
by /f(1); for if g is another monotone endomorphism with /f(1) = g(]1), 
then by §3.5 it follows that f(x) = g(x) for all rational numbers.®’ But then 
(x) = g(x) for every real number x; for let M, again denote the set of 
rational numbers <x, so that x is the least upper bound of M,. Then 
if the mapping f is monotone increasing, f(x) must be the least upper 
bound of f(M,) (= the set of f(r) with re M,), and if f is monotone 
decreasing, then f(x) must be the greatest lower bound of f(/,). Conse- 
quently, just as in §3.5 for the rational numbers, the mappings x — cx 
are the only monotone endomorphisms of the module of real numbers. 
Thus, in exactly the same way as by (40) for the integers, we can define 
multiplication for the real numbers by setting f(x) = f(1)x, where f is a 
monotone endomorphism or is the zero mapping (x — 0). This definition 
involves only the concepts of addition (since f is an endomorphism) and, 
order (since f is monotone) and the number |, so that the isomorphisms of 


87 Of course, we cannot merely cite the above theorem but must prove it anew by 
the same method, since now f(x), g(x) are no longer required to be rational numbers. 

88 As G. Hamel has shown (“Eine Basis aller Zahlen und die unstetigen Lésungen der 
Funktionalgleichung f(x + y) = f(x) + f(y),” Math., Ann. 60, 459-462, 1905), there 
exist other (nonmonotone) endomorphisms; these are all discontinuous and even 
unbounded in every neighborhood of any number, so that in fact they are extremely 
strange functions. 
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the complete ordered modules considered in §4.5 remain isomorphisms with 
respect to multiplication. The field of real numbers is uniquely determined, 
up to order-preserving isomorphic mappings, by the requirement that it be 
a complete ordered field containing the rational numbers, completeness for 
an ordered field being defined in exactly the same way as for an ordered 
module. 

By making use of endomorphisms, it is very easy to prove the fact, 
already proved in §4.4, that every real number =40 has an inverse with 
respect to multiplication; in other words, that we are actually dealing here 
with a field. Since the endomorphism x — cx (c 4 0)is monotone and there- 
fore one-to-one, it maps the module of real numbers isomorphically onto a 
module YM, which is also ordered and complete. But then the module 
of the rational numbers is mapped onto an isomorphic module which has 
all the properties of the module of the rational numbers and therefore 
differs from it only in the names given to the elements. Thus by the theorem 
of maximality in §4.5, the set I contains all the real numbers, including | 
in particular, so that there exists a number x with cx = 1. 

Up to now the proof of the existence of a monotone endomorphism f 
with f(1) = c (40) has been taken over from §4.4. But we can give an 
independent proof on the basis of the following definition: for c, x > 0 we 
define f(x) as the least upper bound of the set of all products rs of positive 
rational numbers with r < c,s < x and then set f(0) = 0, f(—x) = —f(x); 
then the mapping —/f is seen to be a monotone endomorphism with 
(—f)(1) = —c. Since zero and the positive and negative numbers must 
now be treated as separate cases, the proofs for the rules of calculation will 
be considerably longer than our earlier proofs. Of course, this difficulty 
can be avoided if we first construct only the positive rational numbers 
(see the end of §3.2), proceed from these to the positive real numbers, 
and then construct the real numbers in the form a — a’ as in §2.2. Then 
the multiplication of positive real numbers is defined as in the present 
section, and the simplest subsequent procedure is to define the multipli- 
cation of real numbers in accordance with (43), by the method indicated 
at the end of §2.4. 

It is obvious that the monotone endomorphism x —> cx is also a 
homomorphism with respect to multiplication if and only if c(xy) = cxcy 
for all x, y; that is, if c? = c or, since c 4 0, if c = 1. Thus there is only 
One monotone isomorphism of the field of real numbers into itself, namely, 
the identical mapping. Now any isomorphism of the field of real numbers 
into itself must be order-preserving, since we shall see in §4.7 that every 
positive real number is a square (i.e., is of the form x?), while on the other 
hand a number x? + 0 must be positive in an ordered ring. Thus the 
domain of positivity must consist exactly of the elements x? ~ 0 and will 
therefore, in view of the equation f(x?) = f(x)?, be mapped into itself 
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by any isomorphism f of the field of real numbers; but if x < y, or in 
other words 0 < y — x, then 0 < f(y — x) = f(y) — f(x), or f(x) < fy), 
so that the isomorphism is order-preserving as stated. An isomorphism 
of a field onto itself is called an automorphism of the field. Thus, like the 
field of rational numbers, the field of real numbers admits exactly one 
automorphism, namely the identical mapping. In IB7, §6, and IB8, §1.2 we 
will find examples of fields that admit other automorphisms as well. 


4.7. Roots 


We shall show that after choice of any natural number n in the field 
of real numbers any number a > 0 is the nth power of exactly one real 
number x > 0; in other words, there exists exactly one number x with 
x" = a,x > 0. For if 0 < x < y, complete induction on n shows that 
x” < y” by the monotone law of multiplication, which proves the unique- 
ness. The existence follows from the mean-value theorem (IB8, §2.1), 
since on the one hand 0" < a and on the other (1 + a/n)" >a. The 
number x, uniquely determined in this way by x” = a, x > 0, is denoted 
by Waand is called the nth root of a; for n = 2 we write Va instead of Wa. 
For n = 2k (k a natural number) the assumption a > 0 is necessary, 
since x?* = (x*)? >0; and discarding the requirement x > 0 would 
mean that the uniqueness of the solution of x" = a is lost, since both 
Wa and —Wa satisfy the equation. The fact that solutions of an algebraic 
equation are also called roots of the equation seems to have led to 
misunderstanding and to the undesirable practice (for n = 2k) of calling 
both a and —Va the nth root of a, or even of writing both V4 = 2 
and V4 — —2, without taking into account the fact that if we are to 
allow many-valued expressions of this sort, the comparativity of equality 
is lost (since otherwise V4 = 2, V4 = —2 would imply 2 = —2). 
Let us again illustrate the language adopted here (which is quite common): 
the equation x? — 4 = 0 has the two roots 2 and —2; but the square root 
of 4 is 2 and not +2. Let us note the equation 


(66) Va = | al, 


in which the sign for the absolute value is often quite wrongly omitted; 
the proof of this equation follows at once from | a| > Oand | a |?* = a™. 


This use of the root sign in the field of real numbers is to be distinguished 
from its frequent use (as in IB7, §2) in algebraic extensions of fields. For a 
rational number a the symbol ¥/a denotes an element « of an extension of the 
field of rational numbers which satisfies the equation «" = a. But even in a 
prescribed extension the value of « is in general not uniquely determined by 
the equation «” = a: thus by «/a@ we mean an arbitrary one of these elements, 
which is to remain fixed during a given investigation. The results obtained 
for /a remain valid if we replace ¥/a by any of the other elements « (from 
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the extension in question) with o* = a. For example, for /—3 we may take 
either the complex number i4/3 (see IB8, §1) or the complex number —i1/3, 
provided the extension in question is contained in the field of complex numbres 
(which is not necessarily the case). 


Ifm = 2k — 1 (Kanatural number) we may allowa < Oin the definition 
of the nth root, provided the restriction x > 0 is also discarded; for 
x — x" now takes positive numbers into positive numbers and negative 
numbers into negative numbers, and (— x)" = —a means the same 
as x" = a. In particular, we have V—l = —1, while v1 is not 
defined. In contrast to (66), we have 


(67) Gl = a, 


the proof of which is an obvious consequence of the identity a?*-! = q?*-1, 

In view of the general validity (provided a is nonnegative for even 7) 
of the equation (Wa |)" = a, we are led by (66) and (67) to conjecture 
that more generally 


wom . Wal)", for odd n 
(68) Van = (W\ al)", for even n with a” > 0° 


where m is an integer and a ~ 0 for m < 0. Since in the second case the 
expression on the right side is obviously >0, we need only prove that 
in both cases the mth power of the right expression is = a™, which is 
easily proved from (49). 

Further rules for calculation with roots are 


(69) Wab = VaWb, if nis odd ora, b > 0, 
(70) "Va = a/ Wa, if m,nare odd ora > 0. 


Since the right sides are obviously >0 for even n, we need only show, 
from (50) and (49), that the nth power of the right side of (69) is = ab and 
the mnth power of the right side of (70) is = a. 

For a > 0 we also obtain from (70), for natural numbers n, h and 
arbitrary integer m, that 


(71) (Van = Van. 


For m/n = m'/n' with integers m, m’ and natural numbers n, n’ we obtain 
from (71) that 


(Wa)™ = ("Wayne = ("Vann = (Vay. 
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Thus, the expression (*/a)™ depends only on a and on the fraction m/n 
and may therefore be denoted by a™/” (to be read as a to the m/nth power); 
since Va = a, this definition agrees with the definition of powers for 
integral exponents. In particular, for a > 0 we can write Wa in the 
form a'/”. It is easy to show that the rules (47), (49), (50) hold for arbitrary 
rational exponents, provided we assume that the real numbers a, b are 
positive. 


The validity of (47) for arbitrary rational exponents r, s, i.e., 
at? = a‘a’, 


means that the mapping r — a’, defined by the positive real number a (and 
denoted below by f) is a homomorphic mapping of the group of rational 
numbers under addition (more concisely, the additive group of the rational 
numbers) into the group of positive real numbers under multiplication (the 
multiplicative group of the positive real numbers). The fact that the positive 
real numbers form a group under multiplication is merely a special case of the 
more general fact that the domain P of positivity of an ordered field is a group 
with respect to multiplication: for if a, b¢ P, then ab e€ P by (44,); and if ae P, 
then a! = a(a!)’ € P by (44,), since (a)? € P by a remark near the beginning 
of §3.4. 

We now assume a > 1. For natural numbers n, m we then have a™/* > 1; 
for if a"/*< 1, it would follow from the monotonic law of multiplication 
that a" < 1, whereas in fact a” > 1 follows from a > 1 by the same monotonic 
law. Setting r — s = m/n, we again obtain from this monotonic law 
that a’ = a™/"q'> a’ if r>-s. Thus if a = a’, both r<s and r>s are 
impossible, so that r = s. Consequently the mapping is one-to-one and is 
even an order-preserving isomorphism: namely, the real numbers a’ are in 
one-to-one correspondence with the rational numbers r, and act in exactly 
the same way with respect to order and multiplication as the rational numbers 
with respect to order and addition. Thus in view of the monotonic law of 
multiplication. and the fact that a set of positive numbers can have only positive 
upper bounds, the multiplicative group of positive real numbers is a complete 
ordered module. To be sure, this module does not contain all the rational 
numbers, but (by what has just been proved) the a’ do constitute a set of numbers 
which acts exactly like the entire set of rational numbers with respect to order 
and to the given operation (which is now multiplication instead of addition). 
Thus the method of proof in §4.5 can be used to extend the isomorphism f 
to an isomorphism of the additive group of all the real numbers onto the 
multiplicative group of the positive real numbers: to do this, given any real 
number x, we merely define f(x) in accordance with (65), namely as the least 
upper bound of the-set of all numbers f(r) = a’ (with rational r < x). Writing a” 
instead of f(x), we have 


qrty = a*aQ’ 
for all real numbers x, y. The function x — a* thus defined is called the 


exponential function with base a; as an order-preserving isomorphism it is 
monotone increasing and thus determines an inverse function on the set of 
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positive rea] numbers; this inverse function is called the /ogarithm®® to the base a 
(abbreviated: “log or also log). Of course, this function is again an (order- 
preserving) endomorphism of the multiplicative group of the positive real 
numbers onto the additive group of the real numbers; that is, 


"log xy = “log x + “log y. 


In making a computation we may therefore replace multiplication by addition, 
a fact which explains the practical importance of logarithms. 

If for a we choose a positive real number < 1, the foregoing results remain 
valid, except that now the exponential function and the logarithm (to the base a) 
are monotone decreasing rather than increasing. 


4.8. Uncountability 


A set is said to be countable (cf. IA, §7.3) if either it is finite or else is 
equivalent to the set N of natural numbers® (see §1.5); otherwise the set 
is said to be uncountable. A non-empty set is countable if and only if its 
elements can be represented as the terms of an (infinite) sequence. For if M 
is infinite, then a one-to-one mapping of N onto M provides such a 
sequence, and if M is finite and is therefore the image under the mapping 
n—» a, of the segment A,, (see §1.5), then after choice of any ac M the 
mapping n — b, , with b, = a, forn < mand 6b, = aforn > ™m, has the 
desired property. On the other hand, if we assume that M consists of all 
the terms a, (” = 1, 2, ...) of a sequence and is not finite, then we can 
define a mapping f of N into M recursively as follows [cf. (4’)!]: 

SC) = a,, f(r + 1) = a, where k is the smallest natural number®! 

with a, ~ f(\), ..., f(a). 
This mapping is obviously one-to-one, since f(m) ~ f(n + 1) for 
m <n -+ 1. By complete induction on m we now show that every element 
a,, in M occurs as an image in the mapping /; for by the definition of f it 
follows from a,, = f(n) that each of the terms a,, ..., @,_; has one of the 
values f(1), ...,f(” — 1) and thus if @,.4144 (1), .... f(a), the equation 
Ans = f(n + 1) must hold, which completes the proof that M is 
equivalent to N. 

The set of rational numbers is countable. For the proof we first represent 
all the positive rational numbers as terms of a sequence by writing them 
in the order 


1/1, 1/2, 2/1, 1/3, 2/2, 3/1, 1/4, 2/3, 3/2, 4/1, ..., 


*° It is preferable to use the word ‘‘logarithm” for the function rather than for the 
value of the function. The “logarithm of x” is in fact the value of the logarithm function 
for the argument x. 

* Thus a set is countably infinite if and only if it is equivalent to N. 

*! If there were no such number, M would consist only of the elements f(1), ..., f(1) 
and would thus be finite. 
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so that the nth term of the sequence is defined by a, = 1, a, = i/(m — /) 
forn > 1, if 


m—2 m—-1 m—2 
Yk<n<e Vk n=VkK+i. 
k=1 k=l k=1 
The sequence 1 —> a), with a; = 0, ag, = ay, Qz441 = —@, then comprises 


the entire set of rational numbers, so that by the above argument the set 
of rational numbers is countable. 

But in contrast to the rational numbers, the set of real numbers is 
uncountable; in other words: given any sequence of real numbers, there 
exists a real number which is not a term of the sequence. 

To prove this, we represent the wth term (” = 0, 1, 2, ...) of the given 
sequence of real numbers as the sum of an integer a, and a proper decimal 
fraction with the digits a,,,,(<g, >0), where g > 2 is any chosen base: 


(72) An = Ang + 0.4, 1AnoGng 0 3 


here, as in §4.1, we exclude the case that a number m, exists with 
Qnm = & — 1 for all m > m,. Then it is obvious that the a,,,, are uniquely 
determined by the a, . For the sequences m — a,,,, we now consider the 
diagonal sequence n —> a,,, and form the sequence n — b,, with 


5 — $2 if Ann FO. 
eS St if Ayn = 0. 


Since 5, < g — 1, the number b = by + 0.6,b,b,... is of the same form 
as (72), so that b, ~ a,, (n = 0, 1, 2, ...) implies the inequality b a, 
(n = 0, 1, 2,...). Thus the real number 6b is not a term of the given 
sequence of real numbers. 

The procedure by which 5 is determined from the sequence n—a,,is called 
the (second) Cantor diagonal procedure (see also IA, §7.3). The uncounta- 
bility of the set of real numbers proved in this way appears paradoxical, 
since it is certainly true that every real number must be defined in some 
way, and such a definition employs only finitely many letters of the 
alphabet, together with finitely many special symbols (such as |, by repeated 
use of which we can express all the natural numbers). So if we add the 
special symbols to the alphabet in any definite arrangement, the definitions 
of all the real numbers can be arranged in lexicographic order, in contra- 
diction to the fact that the set of real numbers is uncountable. It seems that 


®2 It follows that irrational numbers (that is, real numbers that are not rational) 
must exist. It is easy to give some examples directly: from the theorem in IBS, §4.4, it 
follows at once that +/n is irrational for every natural number 7 which is not the square 
of a natural number; in particular, /2 is irrational. 
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this paradox, which is really a serious one, can be resolved only by assum- 
ing that any natural language is so inexact as to lead inevitably, in extreme 
cases, to a contradiction. But if we use a formal language, whose formulas 
(propositions) are constructed according to exact rules prescribed in 
advance, then any mapping, and in particular a mapping that might 
possibly map the natural numbers onto all the real numbers, would 
necessarily be constructed in terms of this formal language. Then the 
uncountability of the set of real numbers would simply mean that in this 
formal language no such enumeration of them can be constructed. But if 
we make a suitable extension of the formal language, then all the expres- 
sions in it, and in particular all the real numbers that can be described in 
it, can in fact be enumerated in the new language. The concept of 
countability is thus dependent on the linguistic expressions®? at our 
disposal. 

This explanation of the paradox is available only from the constructive 
(IA, §1.5) or the operational (IA, §10.6) point of view. From the classical 
point of view (IA, §1.5) the real numbers (or the mathematical entities 
used to define them, such as the sets of rational numbers in §4.3), exist 
independently of the way we construct them. But now there is no longer 
any paradox; we must simply recognize that it is impossible to find any 
procedure for setting down in succession the definitions of all the real 
numbers. 


Appendix to Chapter 1 


Ordinal Numbers 


The basic property of the set N of natural numbers, namely that every 
non-empty subset has a first element, i.e., a smallest element; cf. IB], §1.3) 
leads us to consider arbitrary well-ordered sets and to interpret them as 
ordinal numbers. The principal purpose of the ordinal numbers is, in fact, 
to determine the “‘rank’’ of each element in a set of elements, and for that 
purpose the well-ordered sets are exactly suited. After the sequence of 
finite ordinal numbers 0, 1, 2, ...,”2, 2+ 1, ... come the countable infinite 
ordinal numbers w,w + 1,...; these ordinal numbers form a well- 
defined set 2, which has taken its place in mathematics alongside the set 
R of real numbers. 


93 In §1.5 we defined the concept of finiteness by means of a mapping; but it can be 
shown that this concept is independent of the linguistic expressions employed (see 
Lorenzen [1]). 
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1. Atomization of the Continuum as an 
Introductory Example 


Let S = [0,1] denote the set of real numbers in the closed interval 
0 to 1. We divide the interval into two subintervals with one point in 
common, and then divide these subintervals similarly and so forth, each 
time dividing a closed interval J into two closed subintervals f7. Thus fS 
denotes the set of the two subintervals of S, /fS = f2S denotes the set of 
the four subintervals, {3S the set of the eight subintervals, and so forth. 
From the interval S we obtain by the first division the subintervals S, , S, 
with S, <S,, from S, by the second division the intervals Soo , So. 
with Sy < So, and from S, the subintervals S,, and S,, . In general, for 
any S;, where the subscript i stands for a dyadic sequence of integers 
already assigned, we denote by S,, and S,, the left and right subinterval, 
respectively, of the interval S; . 

For example, we determine in this way the sequence of intervals 
So > Sor > So1o > So101 > So1010 > --- in which O and 1 appear alternately. 
The intersection of the intervals in this sequence, to be denoted 
by Soioio... > is either a point or a closed interval. In the latter case we can 
continue the subdivision and consider the intervals S,)9 and S,,; , where a 
denotes the infinite sequence of numbers 01010... . 

In other words, we have here a “well-ordered” set of processes P, , Po, ..., 
where P,, leads to the 2” intervals Sj... with i,, ..., in € {0, 1}. After all 
these processes P,, there comes a process P,.3,,.., which we shall call 
P,, where v cannot be a natural number since these have all been used. In 
P, we consider the set of “‘remainders’’ 


(1) Sirtgeootgeee ’ hy ’ ly a 9889 In gee E {0, 1}. 


If this set contains at least one interval consisting of more than a single 
point, we consider the next process; that is, we subdivide into fJ all the 
intervals J of the form (1) that consist of more than a single point; this is 
the process P,,, . Then follow the processes P,,,, P,,3... : After all these 
processes P,,,, with ne N comes P,,,. In this process we consider all 
intersections (\F, where F is any system of comparable intervals that 
contains an interval from each of the preceding processes. If the elements 
of F are denoted by S;,, Si¢, -.- Sigs _, it is natural to denote this 
intersection by 


peetytygie 


Di tusstttaess > 

where the index represents a ‘“‘double sequence’’ (that is, a juxtaposition 
of two sequences, one after the other). In this way we can continue the 
subdivision until there is nothing more to divide, namely, until we have 
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arrived at all the isolated points of the reduced continuum [0,1]. For 
example, if the subdivision is always a bisection, the process P, is the last 
one, since every set then consists of a single point. It is worth noting that 
the result is the same if the subdivision f X of X is carried out uniformly; 
that is, in such a way that the ratio of the lengths of the subintervals is 
constant.! In other cases the ‘“‘height” of the subdivision can become very 
great; that is, the necessary number of divisions is “extremely great.’’ If 
the subdivision of 0, 1 is undertaken in such a way that whenever possible 
the half-segment [0, $] lies in one of the subintervals, then the ‘“‘height’’ 
is at least equal to vy + v. At any rate, we see that the set of processes 
P,, P,, Pe, ... is well-ordered and that the finite ordinal numbers are not 
sufficient to deal with irregular atomizations. 


2. Fundamental Concepts 


2.1. A set that is ordered by a relation (cf. JA, §8.3 and §7.4) has a first 
element x if no element of the set precedes x or, in other words, if there 
is no element z in the set such that z < x; and similarly y is a last element 
if no element of the set follows y or, in other words, if there is no element 
z such that y < z. An ordered set M is said to be well-ordered if the set M 
itself and each of its non-empty subsets has a first element in the prescribed 
ordering of M. By definition, the empty set 9 is also well-ordered, and the 
same remark holds for every set consisting of a single element. 

The set N of natural numbers in the usual order is well-ordered, which 
is one of its fundamental properties. Here the ordering is the usual < 
(smaller than) relation. If we order the set N by putting all the odd natural 
numbers first and then all the even ones 


(2) 135 D53%95 2, 4, 6, «+5 


it is still well-ordered, where by the ordering we mean the relation that 
holds for the pair of numbers (m, n) if and only if m is odd and n is even 
or else, if m, n are both even or both odd, then m < n. The corresponding 
remarks hold for the sequences 


(B)>° ls 5, acct PAs ie SEE “sal Peary DP ee ele CDi oa 
De DP By 2 By iw 
On the other hand, the set of natural numbers in the order 
wel t= Vy 37251 


is not well-ordered. The sets Q and R of all rational and all real numbers in 
order of magnitude are not well-ordered; for example, neither of them 


’ Since lim,,o 9" = 0 for every g with —1 <g < +1. 
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has a first element. However, it is easy to well-order Q.? On the other 
hand, no one has yet succeeded in well-ordering the set R of all real 
numbers, although this problem is closely related to many other interesting 
problems; for example, the question whether for every subset M of R we 
can define a permutation py of M such that py(x) 4~ x for every x E M. 
For if it is possible to well-order R, and therewith every M with M C R, 
this question can be answered in the affirmative. 


2.2. Among the subsets of an ordered set M the initial segments are 
particularly important. A subset A of M is called a segment if for every 
element a of A the set A also contains every element 5 that precedes the 
element ain M, or in other words if a € A and b<aimply b € A. The empty 
set and the entire set M are also called segments; all other segments are 
proper segments of M. 

It is important to note that if A is a proper segment of a well-ordered 
set M, then the set M — A has a first element. 

If we denote by (-, x)y and (-, x], the sets of all ye M such that 
y <xand y < x, respectively, then for every x € M the sets (-, x) and 
(- , x], are segments of M; they are sometimes called initial intervals of M, 
or the segments determined by x. 


2.3. For ordered sets we define similarity or isomorphism as follows. 
An ordered set M is said to be similar or isomorphic to an ordered set M’ 
if there exists a one-to-one mapping f of M onto M’ such that for every 
pair m, , m, of elements of M we have m, < m, if and only if fm, < fm, 2 


3. Simple Properties of Well-Ordered Sets 


3.1. Let W Be a Well-Ordered Set 
Theorem 1. Every subset of the well-ordered set W is well-ordered. 


This theorem follows immediately from the definition of a well-ordered 
set. 


2 If n is a natural number (7 > 1), let Q, be the well-ordered set of rational fractions 
1 ] 2 2 n—1 n—lon n 
nel eS a a 
(note that they are not in the natural order). By forming the sequence 0, Q, , Q2 -*: and 
striking out the terms that have already occurred, we obtain the well-ordered set 
04,-$54,-.8-$:4.-$ b-$b bb bb Bt te 


of all rational numbers. This ordering of Q is obviously different from the natural order, 
since smaller elements may precede larger ones. 
4 The symbol fa or f(a) denotes the f image of a. 
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Theorem 2. A set W is well-ordered if and only if it contains no infinite 
“decreasing sequences” (regressions). 


Proof: If W has an infinite decreasing sequence a,,@,,@,,... With 
Ay > a, > a, > ..., then the set of these elements, which is a subset of W, 
has no first element and therefore W is not well-ordered. Conversely, if W 
is not well-ordered, we let A © W denote a non-empty subset with no 
first element. Then for a, € A there is an a, € A with a, < a,, and similarly 
a, € A with a, <a,, and so forth, which leads to the infinite decreasing 
sequence dy , @, , Q,.... 


Theorem 3. The well-ordered set W is not similar to any of its proper 
segments. 


Proof: Assuming that W is similar to a proper segment A of W, we 
let-f denote a mapping of W onto A. Now let a be the first element of 
W — A; then fa <a; but then also ffa = f*a < fa, f8a < fa and so 
forth, which leads to the infinite decreasing sequence f”a with n in N, in 
contradiction to Theorem 2. 


3.2. The Principle of Induction for Well-Ordered Sets 


Theorem 4. Let W be a non-empty well-ordered set. Let the set M be 
such that 


(1) M contains the first element of W. 


(2) ifxe Wand(-, x)p © M, then x € M (for the notation here see 2.2). 
Then M 2 W. 


Proof: If the assertion M2 W were false, the set W — M would not 
be empty. As a non-empty subset of W it would have a first element, call 
it x. Then we would have ( - , x)y © M and thus by the induction hypothe- 
sis (2) also x € M, in contradiction to the fact that xe W — M. 

For W = Nthe principle of induction for well-ordered sets reduces to the 
ordinary principle of complete induction. 


3.3. Comparison of Well-Ordered Sets 


Theorem 5. If the well-ordered set A is similar to a segment of the 
well-ordered set B, then there exists exactly one similar mapping of A such 
that fA is a segment of B. 


Proof: Let fand g be two distinct similar mappings of A such that fA 
and gA are segments of B. Let A, be the maximal segment in which fand g 
coincide. Then we must show that A, = A. Otherwise we would have 
A,C A. Let x be the first element of A — Ay, so that fx and gx are the 
first elements of B— fA, and B—gA,. Now fA, = gA,, so that 
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fx = gx, since fx and gx are the first element in one and the same set. 
Consequently, f and g coincide in a larger segment of A, namely, in 
Ag U {x}. But this is a contradiction, so that A, = A. 


Theorem 6. (Fundamental theorem on well-ordered sets.) If A, B are 
well-ordered sets, then either A is similar to a segment of B, or B is similar 
to a proper segment of A. 


Proof: Let us assume that A is not similar to any segment of B, and 
then prove that B is similar to a proper segment of A. Let by be the first 
point of B and fb, the first point of A. Let us then assume that X is a 
segment of B which is mapped isomorphically by fy onto a segment of 4; 
let x be the first element of B — X and f,x the first element of 4 — fyX; 
then we have an isomorphic mapping fy of (- , x], onto a segment of A. 
Let M be the set of all elements x € B with the property that (-, x], is 
isomorphic to a segment of A. We see at once that the requirements (1) 
and (2) of the theorem on induction are satisfied for B = W and thus 
M 2 B. This means that every segment X of B is isomorphic to a segment 
SxX of A, where fy is the corresponding isomorphic mapping of X. If 
XC YCB, then fyz = fyz for z €X, since otherwise the restriction of 
Sy to X and the mapping fy would be two distinct isomorphic mappings 
of X onto a segment of A, in contradiction to Theorem 5. Thus fz = fyz for 
every z€X and every segment X of B defines an isomorphism, and 
therefore fB is a segment of A. 

But the set fB is a proper segment of A; for if we had fB = A, then A 
would be isomorphic to B, in contradiction to the assumption that A 
is not isomorphic to any segment of B. 


4. Definition and Simple Properties of Ordinal Numbers 


4.1. Definition 

By the fundamental theorem on well-ordered sets, two distinct well- 
ordered sets A and B are comparable to each other in the following sense. 
Either A and B are similar, or A is similar to a proper segment of B, or B is 
similar to a proper segment of A. Thus it is natural to regard the well- 
ordered sets as ordinal numbers. 

Every well-ordered set W represents an ordinal number (OW)* under the 
following conventions concerning equality and order (§4.2) and computation 
with well-ordered sets (§4.3). 


4.2. Equality and Order of Ordinal Numbers 
If A and B are two well-ordered sets, their ordinal numbers OA and OB 


* Following Cantor, many authors write W instead of OW. 


1 Construction of the System of Real Numbers 159 


are equal if and only if A and B are similar. Thus any two similar well- 
ordered sets determine one and the same ordinal number. 

The order-relation OA < OB or OB > OA means that A is similar 
to a proper segment of B, and OA < OB means either OA < OB or 
OA = OB. 

It is easy to prove the transitivity of equality and order: from 
OA < OB < OC follows OA < OC, where we have OA < OC if the 
symbol < actually means < at least once. 


4.3. Computation with Ordinal Numbers 


4.3.1. Sum of Ordinal Numbers. If A and B are two disjoint® well- 
ordered sets, then A + B denotes the union 4 U B so ordered that the 
orders of A and B are preserved and all the elements of A precede those 
of B. Then the well-ordered sum A + B determines the sum of the ordinal 
numbers OA + OB. It is easy to show that the sum is independent of the 
special representatives of the two ordinal numbers: for if OA = OA, and 
OB = OB, and also A, N B, = 9, then we have OA + OB = OA, + OB, . 
More generally, let X be a well-ordered set and for every x € X let f(x) be a 
well-ordered set; then if 


SX) Af) = 9 for x, x’ EX, Xx’, 


let Sze f(x) be the union (<x f(x), so ordered that f(x) precedes f) 
if and only if x < x’ in X, 

In case the sets A and B are not disjoint, we proceed as follows. If 
(A, B) is the ordered pair of well-ordered sets A and B, let {1} x A be the 
set of all ordered pairs (1, a) with ae A and let {2} x B be the set of all 
pairs (2,5) with be B. If we now let the union {]} x A U {2} x B be 
lexicographically ordered, the result is the ordered sum 


{]} x A+ {2} x B. 


This sum represents a well-ordered set which we may consider as an ordinal 
number and denote by OA + OB. 

More generally, if B is a well-ordered set and if to every be B there 
corresponds a well-ordered set f(b), then {b} x f(b) is the set of all ordered 
pairs (b, x) with x € f(b). If the union Uses {b} X f(0) is lexicographically 
ordered, it represents a well-ordered set whose ordinal number is called 
the sum of the ordinal numbers Of(5) taken over the ordinal number OB; 
this sum is denoted by 4.3 Of(b). Of course, there is no need for the /(b) 
to be distinct. 


* That is, AN B= 9. 
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4.3.2. Multiplication of Ordinal Numbers. \f B and C are well-ordered 
sets with the ordinal numbers OB and OC, then the product OB - OC of 
their ordinal numbers is defined as the sum }°,.2 OC. In this case f(b) = C 
for every b € B in the preceding definition of this sum. 


4.4. Special Symbols 
If we write 0(0) = 0, where % is the empty set, and then set 


O{0} = 1, O{0, 1} = 2, O{0, 1, 2} = 3,..., 
we obtain the well-ordered set of ordinal numbers 
0, 1, 2, 3, ..., 7, n+l... 
This set determines the ordinal number O({0, 1, 2,...,”,2-+ 1,...}, which 


is denoted by w or ay. 


4.5. Arithmetic of the Ordinal Numbers 

4.5.1. Addition and Multiplication. For example, 2 + 3 = 5, and 
a + 0 = a« for every ordinal number «. 

Since the well-ordered sets 


0,1,2,3,.., 2,2 + 1,... 


(4) 
123,45 ony el ee Oya 


are similar under the mapping » n+ 1, we have 1 + w = w. On the 
other hand w + 1 >, since the ordinal number w + 1 may be repre- 
sented by the well-ordered set 


2 n 


(5) 0, oe eae ey ve 


“9 l, 


N| — 


which is not similar to any segment of (4). For if there existed a similar 
mapping f of (5) onto a segment of (4), then (4) would contain f(1) in 
particular, and in front of f(1) would come the infinite set of elements 
f[n/(n + 1], which is impossible, since no proper segment of (4) can be 
infinite, and therefore we must have w + | > w. 

Consequently, 1 + 0=w<w-+1, so that 1+ w<w+ 1, which 
shows that addition of ordinal numbers is not always commutative. 

The same remark holds for multiplication; for example, w +2 = a, 
2-w=wtw >a, so that w-2 ~42-w. However, it is easy to show 
that addition and multiplication are associative: 


(6) (a+ B)+y=a+ (6+), 
(7) (8) y = a(By). 
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We also have 


(8) (a + B)y = ay + By; 
but in some cases 
a(B + y) A oB + ay, 


for example, w(1+ 1) 4w-l+wo-l, since w-2=wa,0+wr>w. 
In exactly the same way as for ordinary ordinal numbers <w, we can 
prove the following fundamental theorem: 


For every ordered pair (x, 8), B 4 0 of ordinal numbers there exists a 
unique ordered pair (x, p) of ordinal numbers such that « = xB + p with 
0<p<B. 


For example, if 8 = 2, then every ordinal number is either of the form 
x: 2 (an even ordinal number) or of the form «2 + | (an odd ordinal 
number). 


4.5.2. Subtraction. If « < B, the equation « + & = B has exactly one 
solution, which is denoted by —a« + 8. The number —« + fis the ordinal 
number O(B — A) of the set B — A if OB = Bf and A is a segment of B 
with OA = a. 

For example, —1 + w = w and in general —1 + « = a for every 
a > Ww. 

Retaining the assumption « < f, let us now investigate 8 — «; that 
is, the solution of the equation & + « = B. 

For example, 8 —0 = 8. But consider 8 — 1, and in particular 
w — 1. The number w — | does not exist, since the equation £ + 1 = w 
cannot have a solution, in view of the fact that € + 1 has a last element, 
whereas w has no last element. 

Thus we have the following result: if 8 > «, then the “‘left difference” 
—a + Bis a uniquely determined ordinal number; on the other hand, the 
“right difference’? B — « does not always exist, and when it does exist it 
may have more than one value. 

For example, —w + w = 0, but w — w may be any ordinal number 
< w, 


Definition. An ordinal number 8 is said to be of the first kind (or to be 
isolated) if B — 1 exists, and to be of the second kind if 8 — 1 does not 
exist. An ordinal number of the second kind, other than zero, is also 
called a limit number. 

For example, 5 and w + | are isolated, whereas w and 2w are limit 
numbers. 

Every number of the second kind is of the form «cw. 
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4.5.3, Exponentiation. We can make the following inductive 
definitions: 


a for every ordinalnumber «a = 0, 


oft] = q + of, 
a’ = sup,;_, af ® for every non-isolated ordinal number « + 0. 
For example, 2” = sup, .,, 2” = w, and similarly n* = w for every n 
withhO<n<w. 
It is easy to prove that 
(9) ob - ay = or +B 
(10) (af) = ay'B, 


But in general it is not true that (a8)” = arBy. 
For example: (2w)” 4 2%w*, since (2w)” < 2%w”; for we have 


(2w)? = sup(2w)" = w® < wth = —ws w® = 2” + w”, 


4.5.4. Monotonic Laws. The following monotonic laws (inequalities, 
cancellations) are valid: 
If y>0O, then aty>a, and conversely. 
If a<f, then yt+a<y+f8, and conversely. 
If «=f, then y+a=y-+8, and conversely. 
If «<p, then a«at+y<f8+-y, and conversely, 
but the relation < cannot be replaced by <. 
For example: although 2 < 3, nevertheless 2+ w=3-+w=w0., 
If a=B8, then ay = By. 
If ay=By,y>0, then a=f. 
If a<B,y>0, then ay < By, and 
If aoy<By,y>0, then a <8. 
If a<B,y>0, then ya < yf, 


but not necessarily ya < yB8 because, for example, 2 <3, w-2 = w°3. 


5. Enumeration by Means of the Ordinal Numbers 


Definition. For each ordinal number « we let [(«) denote the set of all 
ordinal numbers that are smaller than «; for example, /(2) = {0, 1}. Then 


® For a set M of ordinal numbers we denote by sup M or supyey x the smallest 
ordinal number « for which M < a (that is, x < a for every x € M). 


1 Construction of the System of Real Numbers 163 


I(0) is empty and /(w) = {0,1,2...}, so that the set J(w) has no last 
element. 


Theorem 7. For every ordinal number « the set I(x) is well-ordered, 
and OI(«) = «; that is, the set I(«) regarded as an ordinal number is equal 
to «. In other words: every set A of type « is similar to I(«). 


The proof of this theorem is immediate, since the ordinal numbers <« 
are represented by the segments (-, x), of the set A, and the mapping 
x —» O(-, x), provides an isomorphism between 4 and /(a). But instead 
of this mapping it is often more convenient to consider the inverse mapping 
&—> a, , € <a, which indicates how the elements a, of A are represented 
by “indices” from I(«). For example, in ftn. 2, p. 156, we have considered 
an enumeration r, (n < w) of the set Q of all rational numbers. 

The rest of the present section is devoted to the following important 
question: does there exist an ordinal number ¢ with the property that the 
entire set R of real numbers can be put into one-to-one correspondence 
with a sequence a, (€ < ¢) of length »? 

Cantor’s fundamental theorem (theorem of the uncountability of the set 
R; cf. IA, §7.3) states that p 4 w. 

We first prove the following theorem. 


Theorem 8. Every set M of ordinal numbers arranged in order of 
magnitude is well-ordered. 


We must show that every non-empty set X¥ C M has a least element 
inf X.’ But if Be X, the set J(B) is well-ordered, and thus also the set 
I(B) 0 X. It is obvious that inf (7(8) A X) = inf X. 

Now let £2 denote the set of all ordinal numbers a such that the segment 
I(x) (namely, the set of all ordinal numbers <«) is countable.* Then 22 is 
a well-defined set,® which by Theorem 8 is well-ordered and thus defines 
an ordinal number, denoted by w, . Then J(w,) = 2. 

Theorem 9. The set §2 has no last element. 


For if «<w,, then also «+ 1<a,, since the addition of one 
element to a countable well-ordered set of ordinal numibers produces a set 
of the same kind. 

The special continuum hypothesis of Cantor (cf. IA, §7.6) is that 
P= a,. 

In other words, this conjecture states that the cardinal numbers of R 
and [(w,) are equal to each other; thus there must exist a one-to-one 


7 For a set M of ordinal numbers inf M or infzey x denotes the largest number a 
for which «a < x for every xe M. 

§ Countable means: empty (zero), finite, or equivalent to the set N of all natural 
numbers. 

* In contrast, for example, to the “‘set” of all ordinal numbers, which is meaningless 
(cf. the Burali-Forti paradox, IA, §7.5). 
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mapping f of I(w,) onto R. This mapping f enables us to consider, in 
addition to the natural ordering of the set R, the following well- 
ordering <;,: 


£0) <, fC) <sf(2) <p. <A(O) <y... forevery € <a. 


The statement “Every set can be well-ordered” (the well-ordering axiom) 
is equivalent to either of the following two statements:!° 


For every non-empty set S of non-empty sets there exists a set which 
contains exactly one element from each X € S (Zermelo axiom of choice). 

Every inductive, partially ordered set contains at least one maximal 
element (Zorn lemma, cf. B11). 


Here we have the following definitions: A set M partially ordered by < is 
called inductive if every subset K of M that is linearly ordered (not only partially 
ordered) has a least upper bound in M;; that is, an element ae M with x <a 
for all xe K and such that a<a’ for every element a’e M satisfying the 
condition x <a’ for all xe K. By a maximal element m of M we mean an 
element for which there is no element x ¢ M with m < x. An ordered set can 
have at most one maximal element, which in §2.1 we have called its last element. 


In order to continue our comparison of the set 22 with the continuum 
R of real numbers let us prove the following theorem: 


Theorem 10. Ifa, <w,,(n€ WN), then sup, x, <a. 


In particular, if «,(m < w) is a strictly increasing sequence of ordinal 
numbers <w,, then also sup, «, <,. 
The corresponding statement is not valid for the linear continuum. 


Proof: If A, is an ordered set of type «, for every n < w, then the set 
Unew{n} <x A, is countable, since it is the union of countably many 
countable sets. If we order this set lexicographically, we obtain a well- 
ordered set of type « = ay + a + a,+..., 50 that a << w, and a, < a. 

The set X of numbers € < « such that & > a, (n < w) is a well-defined 
subset of the well-ordered set J(a + 1). Thus inf XY exists, and we have 
inf X¥ < « < w, with inf ¥ = sup, «, , as was to be proved. 


Theorem 11. The set 2 is not countable. 


The same statement holds for R. 

If £2 = I(w,) were countable, we would have w,€J(w,), so that 
W, <@,, Which is a contradiction. 

As mentioned earlier, the famous Cantor continuum hypothesis states 


10 For proofs of the equivalence see, for example, Birkhoff [1], pp. 42-44. The well- 
ordering axiom has recently been proved to be independent of the other axioms of set 
theory (See IA, §7.6). 
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that the sets 2 and R have the same power. This hypothesis represents a 
postulate independent (see IA, §7.6) of the other axioms of set theory: 


The negation of the continuum hypothesis is also a possibility. 


Theorem 12. For every ae R the sets (+ ,a)rz, (a,°)p are isomorphic 
to each other and to R. 


The mapping x > I/(a — x) + a represents an isomorphism between 
(-,@)r, and (a, -)p. 

On the other hand: for every « € 22 the set (+, «)g is countable, and the 
set (a, * )g is not countable. 


Proof: Since OJ(«) = « by Theorem 7, we see that /(«) is countable. 
But if the set (a, +), were also countable, 2 itself would be countable, 
since $2 = I(x) U {a} U (a, -)g, in contradiction to Theorem 1]. 


CHAPTER 2. 


Groups 


Introduction 


The concept of a group is a creation of modern mathematics. Some 
notion of it is to be found in the rich ornamentation of classical art and 
architecture, but its fundamental importance and varied applications were 
not recognized until the nineteenth century. 

The theory of groups originated in the study of algebraic equations, 
where its central importance was recognized by E. Galois, who introduced 
the name “group.” The work of A. Cauchy, C. Jordan, A. Cayley, L. Sylow, 
O. Hélder, G. Frobenius, I. Schur, and W. Burnside freed the theory 
from this subsidiary position and transformed it into an independent 
branch of mathematics, concerned with algebraic operations on sets of 
finitely or infinitely many elements. 

The late appearance of groups in science shows that a theory based on 
them could only have resulted from the modern mathematical method of 
generalization and abstraction, the method of thinking in terms of 
“systems.” With such concepts as “set,’’ “group,” “‘ring,” “‘field,’’ mathe- 
matics has reached a stage of great generality. The object ofits study is no 
longer the special character of certain magnitudes but the structure of whole 
domains. In this way it becomes possible to make statements that are 
valid for many different fields. For an over-all summary or synthesis of 
widely varied parts of mathematics, the notion of a group becomes indis- 
pensable. 

For the theory of groups, as for all branches of modern science, the 
axiomatic method is characteristic. In this method it becomes unmistakably 
clear that the axioms and basic theorems are not necessarily “‘self-evident”’ ; 
in laying the foundations of a logically constructed science we have the 
complete “freedom of spirit” that G. Cantor called the very essence of 
mathematics. Choice of the axioms is restricted by only one condition : 
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freedom from self-contradiction. Whether we have made a useful choice is 
determined solely by the applications, which in group theory are especially 
numerous. Not only does this theory have many applications in other 
branches of mathematics, for example in Galois theory or in the founda- 
tions and development of geometry, but its effectiveness and esthetic appeal 
make it an important instrument in other branches of science and art as 
well : in quantum theory, in crystallography, and in the theory of artistic 
form. 


1. Axioms and Examples 


Ll. Axioms 

Let G be a non-empty set and v a (binary) operation (often also called a 
product) on G, that is, a function on the set of ordered pairs (G, H) of 
elements G, H&G (cf. IB10, § 1.2; in particular, 1.2.5). Then G is said to be 
a group with respect to v if the following four axioms are satisfied: 


(V) The values of v lie in G: 
v(G, H)éG for all G, HeG. 


(A) v is associative; that is, 


v(G, v(H, J)) = v(v(G, A), J) for all G, H, JEG. 


(N) There exists a so-called neutral, or unit, or identity element NinG, 
with 
v(N, G) = v(G, N) = G for all GEG? 


(1) Every element of G has an inverse; that is, for every element G ¢ © 
there exists an element G, such that 


v(G, G) = N? 


When there can be no doubt about the operation in question, the phrase 
“with respect to ...” is ordinarily omitted in the above definition of a 
group. 

From the axioms, as we shall see, it does not follow that 


(K) v(G, H) = v(H, G) for all G, HeEG, 


1It would be enough to require v (G, N) = G for all GE G, since (N) could then be 
derived from the other axioms, but we are not interested here in independence or 
other refined questions of axiomatic theory. 

2 An element with this property should really be called a right inverse; but we shall 
show later, on the basis of the other axioms, that it is also a left inverse [that is, that 
it satisfies o(G, G) = N]; and then it is simply called an inverse. 
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but if a group does satisfy (K), it is said to be Abelian or commutative. If 
v(G, H) = v(H, G) for the special elements G, H € G, then G and H are 
said to be permutable. For example, by(N) the element N is permutable with 
all the elements of 6. 

For simplicity, we shall generally write GH or G + H in place of v(G, H) 
and shall speak of the operation v as multiplication or addition. The additive 
notation is usually restricted to Abelian groups. 

The power | G | of the set G (see IA, §7.3) is called the order of G. If the 
order is finite, the group G is also said to be finite, and its order is simply 
the finite number of elements in G. 


1.2. Examples 


1.2.1. Let I+ be the set of rational integers 0, + 1, + 2,...and let 
v(G, H) = G-+ H be the sum of the numbers G and H in the usual sense. 
Then (V) is certainly satisfied, and (A) holds because addition is associative, 
as is shown in the foundations of the theory of numbers (see IBI, §1.3, 
§2.2). Thus, 

(G+ H)+J=G+(H+ J). 


The number 0 has the properties of a neutral element:G + 0=0+ G = 
G, and —G is inverse to G: G + (—G) = 0. Thus J+ is a group, and is 
Abelian because (K) holds. Its order is Xo. 

1.2.2. Let Pt, Rt, and C+ be the set of rational, real, and complex 
numbers, respectively, and again let v (G, H) = G + H be addition in the 
usual sense. Then the axioms (V) through (J) hold as in 1.2.1, so that each 
of these sets is an Abelian group with respect to addition. The order of Pt 
is Ny, and the order of Rt and C* is the power of the continuum. 

1.2.3. Let Px, RX, C* be the set of nonzero rational, real, and complex 
numbers, respectively, and let v(G, H)= GH be multiplication in the 
ordinary sense. Since multiplication of nonzero numbers is associative and 
the product is also nonzero, axioms (V) and (A) are satisfied. The number 
1 has the properties of a neutral element, and G~ is inverse to G. Since (K) 
holds, the groups Px, RX, and C% are all Abelian. Their orders coincide 
with the corresponding orders in example 1.2.2. 

1.2.4. Let $8 be the following set of quotients of polynomials in x: 


x—1!1 I x ) 
x ?1—-x’x—IV 


J 
p= yl, 


For G, He $ let v(G, H) consist of the substitution of G into H. In other 
words, the operation v (G, H) consists of replacing the symbol x in H by the 
element G. For example, if 


G = g(x) = H=h(x)=1--x, 


x—1?’ 
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then 


v(G, H) = 


= 


By a finite number of trials we see that the result of this operation is always 
one of the six functions, so that (V) is satisfied. In order to show that the 
operation is associative, we could in principle test the validity of (A) for 
all the finitely many triples of elements. Less rigorously, we see that in 
performing the operation v(G, v (H, K)) we first replace x in K = k(x) by 
H = h(x); and then, in the expression v(H, K) = k(A(x)) thus obtained, 
we replace x by G = g(x), with the final result k[A(g(x))]. But it is clear 
that the same result will be obtained if we first construct the expression 
v(G, H) = A(g(x)) and then replace k(x) by x, so as to obtain v(v(G, H), K). 
In this group x servesas neutral element, the elements x, 1/x, 
1 —- x, x/(x — 1) are their own inverses, (x — 1)/x is inverse to 1/(1 — x), 
and 1/(1 — x) is inverse to (x — 1)/x, as can easily be verified. Thus § is 
a group of order 6 and is not Abelian; for example, 


elt ale tba gh aoltnd) 


1.2.5. Let bea set, which we shall now call a space in order to distin- 
guish it from other sets to be considered later; and correspondingly, its 
elements P, Q, R,... will be called points. Let S™ be the set of permuta- 
tions on 8; that is, the set of one-to-one mappings of R onto itself. If 
o € S™, we denote by Po the image of the point Pe ® under the mapping 
a. Then a has the following properties: 


(1) Po €§® for all PER, 
(2) P,o = P.o implies P; = Py. 
More generally, if for a subset © CR and a subset KR C S™ we denote by 


QE the set of elements Po, P € Q, o ER, then the fact that o isa mapping 
onto R is equivalent to® 


(3) Ro =R. 


Thus (1), (2), and (3) characterize the permutations o among all the 
mappings of ®. In S® we now introduce the following product: for 
o, 7 € S™ we define the mapping or (see Figure 1) by 


(4) P(or) = (Po)r. 


* No distinction is made here between an element and the set containing it as sole 
member, Thus ¢ in (3) in fact represents {a}, the set consisting only of «. 
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Thus or is the mapping of R which results from successive applications of 
the mappings o and r. We now show that with this operation S" is a group. 


Pe 

ae wa 
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eee ‘Ss Pid \ \ 
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(V): We must show that a7 has the properties (1), (2), (3). 


(1): Since Po €R and Pr ER for all PER, we have P(or) = (Po)r ER 
for all PER. 


(2): If P,(or) = P,(or), then (P,o)r = (P,0)7, and thus P,o = P.o 
and P, = P,, since (2) holds for o and r. 


(3): Since Ro = Rr = R, it follows that R(or) = (Ro)r = Rr = R. 
(A): On the one hand, we have 
P{(or)p] = [P(or)]p = [(Po)r]p, 
for all PER and o, r, p € S®, and on the other hand 
Plo(rp)] = (Po) (rp) = [(Po)r}p, 
as follows (see Figure 2) from (4). 


(N): The mapping 1 defined by P1 = P for all P ER is a permutation, 
the so-called identical permutation. For this permutation we have 


P(10) = (P1)o = Po 
and 
P(o1) = (Po)1 = Po, 


so that 10 = o1 = o; and therefore 1 has the properties of a neutral 
element. 


(I): In order to prove the existence of an inverse for o € S™, it must be 
remembered that since P runs through all the elements of exactly once, 
so will Po, by (2) and (3). Thus, 


(5) (Po)s = P 
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defines a mapping 6 of R for which (1), (2), and (3) are satisfied. From (5) 
we see at once that og = 1, so that @ is an inverse of o and therefore (D 
is satisfied. 

The group S* is called the symmetric group on R. When we examine it 
more closely, as we shall do below, we see that for finite R the order is 
|® |!, where | R | is the power of R. 

In order to be able to deal more conveniently with multiplication in S®, . 
we identify each of its elements o with a symbol consisting of two rows: 
the first row contains every point of R exactly once and, directly under- 
neath, the second row contains the images of these points: 


o=(p, a Pa ”). 


Two such symbols represent the same permutation if and only if they can 
be transformed into each other by a permutation of the columns. Since o 
is a permutation, the second row also contains every element of ® exactly 
once. 

If we denote by (n) the set of natural numbers 1, 2, ...,, then the six 
elements 


: 3 3 
re(; a tee i) p=(3 7 >) 
r=(i go) 8=G 2 ik =G 4 


comprise all the permutations of S, The multiplication of B and 4, 
for example, leads to 


B= (3 1 ale 2 -( 3 2 


which may be read as follows: | in 8 into 3, 3 in 6 into 1, therefore | in 88 
into 1, and so forth. On the other hand, the product 68 produces 


1 2 3 
B= (2 1 3): 
Thus the group S® is not Abelian. 
For computation in the space R with |® | = n this standard model of 


S™) is very convenient.4 
1.2.6. Let E, be the Euclidean plane and let (P, Q) be the distance 
between two of its points P, Q. Also, let B, denote the set of permutations 


4 The symbol G, is often written in place of SG, 
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o on E, (regarded as a set of points) which leave invariant the distance 
between every pair of points: 


(Po, Qo) = (P, Q). 


Then B, is a group under the same operation as for the permutations in 
1.2.5; for now 


(V): Ifo, 7 eB, then 
(P(or), O(or)) = ((Po)r, (Qo)r) 


= (Po, Qo), since 7TeéB,, 
= (P, Q), since o6é By. 


Thus the product o7 also leaves invariant the distance between every pair 
of points. 


(A) was already proved in §1.2.5 for the product of any two permuta- 
tions. 


(N): The identical permutation 1 (see §1.2.5) is an element of B,: 


(I): We shall show that if o is in B, , then the permutation o defined in 
§1.2.5 is also in B,: for let P, Qe, and in accordance with (3) let 
P = P*o, Q = Q*o with P*, O* € E,; then 


(Po, Qa) = ((P*a)a, (Q*e)a) 
= (P*(0a), O*(ce)) 
= (P*, Q*) 
= (P*o, Q*o) 
= (P, Q), 


so that 6 € B,. But it was shown in §1.2.5 that o¢6 = 1, so that B, isin fact 
a group. If we think of E, as a rigid plate, the elements of B, are represented 
by those motions of the plate which bring it into coincidence with itself 
without distortion. Thus the elements of B, are called the motions® of E, 
and B, is the group of motions of E, . 

1.2.7. A subset F of the set of points of the Euclidean plane £, is 
called a figure. For a given figure F we consider the set B,,, of motions o in 
B, which map F onto itself; that is, those motions for which, in the notation 
introduced for permutations in §1.2.5, we have 


(6) Fo = F. 


5 They are also called rigid mappings. 


2 Groups 173 


If the elements of B, - are combined in the same way as in the preceding 
examples, then B, - is a group: for we have 


(Vv): If Fo = F, Fr = F, then also F(or) = (Fo)r = Fr = F, so 
that ore B, -. 


(A) holds for arbitrary permutations, as was shown in §1.2.5. 


(N): The mapping 1 defined in §1.2.5 is in B..- since F1 = F. Since 
1o = o1 = 1 for arbitrary permutations o on E,, the mapping 1 is 
certainly a neutral element for B, -. 


(}): Ifo belongs to B, ,, then so does the element o defined in §1.2.5; 
for if PéF and, in accordance with (6), P= P*o, P*eF, then 
Po = (P*o)o = P* «€ F, so that every o has an inverse in By. 

The group B,,f is called the group of the figure F. As an example let us 
consider the group of the four corners of a square in E,. It is easy to see 
that this group is the same as the group Bog of the entire square deter- 
mined by these four corners. 

A motion o of this group is completely characterized by the correspond- 
ing permutation of the four corners, since every point of EZ, is determined 
by its distances from three noncollinear fixed points. 

If we denote the corners by 1,2,3,4 (see Figure 3), we * 3 
see that not all permutations of the corners can result 
from a rigid motion; for example, 


(233) Be. 3 


cannot represent a motion, since the distance is (lo, 40) = (1, 3) + (1, 4). 
The permutations induced by B, 9 are obviously 


3 
(234h G aah Gai ak Gi 23 

3 
32% Gias (a3 ah G21 4 


The order of the group B, 9 is therefore 8. This group is not Abelian, as 
the reader can easily verify. 

The developments of §1.2.6 and §1.2.7 are independent of the dimension 
2 of E,. Thus for the n-dimensional Euclidean space E,,, with arbitrary 
natural number n, we can define the group of motions B, as the set of 
permutations on £, which leave invariant the distance between any two 
points; and the corresponding remark holds for the group B,,r. 
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1.2.8. Let m be a set and let 8™ be the set® of all its subsets, including 
the empty set 9. For a,b € 8™ we now define an operation (written addi- 
tively) as follows (cf. IA, 9.10): 


qUv—qnv=q-+p. 


Thus a + b consists of those elements of m which belong to exactly one of 
the sets a, b (see Figure 4). Then the set 8™ with this multiplication is a 
group. For we have (V): a+b) isa 
subset (possibly 9) of m. (A): Let us 
determine which elements lie in 
(a + b) + ¢, a,b, ce€8™. For this 
purpose we think of the elements of a 
as being marked with a cross, and 
proceed in the same way for the ele- 
ments of b. Then a + b consists of 
those elements that have been marked 
with exactly one cross. If we now 
mark the elements of ¢ with a cross, 
then (a +b)+c contains exactly 
those elements of m which have been 
marked with either one cross or three 
crosses. If we constructa + (b + ¢) 
in the same way, the elements of this subset also receive either one 
cross or three crosses. But since the number of crosses depends only on 
the sets a,b,c and not on the parentheses, the result is the same in both 
cases. Thus (a + b) +¢ =a+(6+ 0). 


(N): The empty set @ has the properties of a neutral element. 


(1): The element a is its own inverse for all ae 8", since a +a = 9%. 
The group 8™ is obviously Abelian and has the order 2!"!, where |m| is the 
power of m. 

1.2.9. If is a natural number, it is shown in the theory of numbers 
(see [B6, §2.10) that every rational integer g can be represented in the form 


(7) g=nh-+r, 


where the rational integers h, r are uniquely determined by g, and r is a 
reduced remainder for n; that is,0 <r <n. Such remainders are often said 
to be reduced modulo n. 

In the set {0, 1, .... 2 — 1} of reduced remainders for n, we define an 
additively written operation as follows. In order to distinguish this 


® The notation here is intended to suggest the name of G, Boole, who was the first 
to consider this definition for the multiplication of subsets. 
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operation from the usual addition + for rational integers, we denote it by 
@ and set 
s@t=r, 


where s, ¢ are reduced remainders for n, and r is the reduced remainder 
uniquely determined by (7), of g = s+ ¢ (in the ordinary sense of addi- 
tion). We now show that the set of reduced remainders for n thus becomes 
a group: 


(V) is obvious. 


(A): We must prove that (s ®t) ®u = s @(t @ u). For this purpose 


we set, as in (7), 
s+tt=nh-+r, 
(8) , 
r+u=ni- p. 


Thus (s ®t) ®u = p. Similarly, if we set 


t+u=nt+ 4q, 


(9) 
s+tq=nk+o, 


then s @(t ®u) = o. From (8) it follows that 
s+tttu=n(h+i)+ 2, 
and thus from (9) that 
stttu=nGi+hk)+oa. 
Thus p = 9, since by (7) these representations are unique. 
(N): The number 0 has the properties of a neutral element. 


(1): The inverse for r isn — rif r+0 and is 0 if r = 0. 


We denote this group by J°™+; it is Abelian and its order is 7. 

1.2.10. In order to construct the corresponding group of reduced 
remainders with multiplication as the operation instead of addition, we 
restrict ourselves to those remainders that are prime to n (i.e., have no 
factor in common with n (see IB6, §2.6)), and then we denote the operation 
(in order to distinguish it from ordinary multiplication of rational integers) 
by ©. We now set 

sors, 
where r is the reduced remainder of g = s- ¢ (under multiplication in the 


ordinary sense) and is therefore uniquely defined by (7). With this opera- 
tion the reduced remainders prime to n form a group. For we have 
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(V): If(s, n) = (t, n) = 1 and, by (7), 
st =nh-+r, 
then also (r, 2) = (st — nh, n) = 1. 


(A): Let s, t, u be reduced remainders and, as in (7), let 


st =nh-+r, 
(10) 

ru=ni-+p 
and 

tu=nj+q, 
(11) yr| 

qs = nk +0. 


Then (sOt)Ou=p,s©(t Ou) =o. From (10) it then follows that 
stu = n(hu +i) + p, 
and from (11) that 
Stu=n(js+k)+o. 
By the uniqueness of (7) we therefore have p = o. 
(N): The number | has the properties of a neutral element. 


(I): In order to show that every element r has an inverse, we determine 
rational integers 7, 7 (see IB6, §2.9) such that 


re¢+nn=10<7r<n. 


Then (7,7) = 1 and ry = n-(—A) +1, so that r©7 = 1, as desired. 
We denote this group by J")*; it is Abelian and (see IB6, §4.2, §5) has the 
order 

p(n) = pa... perp, — 1)... (p, — 1), 


where n = p{)... prt, with distinct primes p, , ..., D,. 
1.2.11. We consider the set &® of real numbers 


gthv3, 


where g, / are rational integers and (g +h /3)(g — hv/3)=g? — 3h? =1. 
The set €®) forms a group under ordinary multiplication of real num- 
bers. For we have 


(V): If 
(12) gi — 3ht = gi — 3hf = 1, 
then 
(g, + hy V3)(g2 + he V3) = (Bee + 3hyhe) + (eshte + Ber) V3 
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is a number in ©), since it follows from (12) that 


(2182 + 3hyhe)®? — 3(gihe + gehy)? = 1. 
(A) holds for all real numbers. 


(N): The number 1 = 1 +0+/3 has the properties of a neutral 
element. ; 


(I): Since 
(g +hvV3\(g —h V3) = g?— 3h? = 1, 


the number g — / 1/3 is inverse to g + A +/3. The group ©®*) is Abelian, 
and by the theory of sets (see IA, §7.3) it has the order X, , since if 2 + +/3 
is in the group, then by (V) the numbers (2 + 1/3)’, y = 1, 2, ... are also 
in G®); but these numbers are all distinct since 2 + +/3 > | and the one-to- 
one mapping (g, h) > g + A /3 puts © in correspondence with a sub- 
set of the countable set of all integral lattice points of a coordinate plane. 

In this proof of the properties of a group the number 3 has played no 
particular role; the reader should consider how the group ©™ is to be 
defined in a corresponding way for every natural number n. 

1.2.12. Let K be a field and let K* be the set of square matrices (see 
IB3, §2.2, §3.4) 


(13) A = (4,), Ain EK 


of order n with nonzero determinant. This set forms a group under the 
operation of matrix multiplication. For we have 


(V): If A, Be Kx, and if we let | X| denote the determinant of a - 
matrix X, then by the rule for the multiplication of determinants 


|A- Bl =|Al|:|BI; 
thus, if | A |, | B| are nonzero, so is | AB|. 


(A): The associativity of matrix multiplication will be proved in [B3, 
§2.2. 


(N): The unit matrix 


0 I 


has the properties of a neutral element. 
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(I): Inverse to A is the matrix (A,,/| A |), where A,,; is the algebraic 
complement of a;;, as is discussed in detail in IB3, §3.5. 


1.3. Examples of Systems That Are Not Groups 

After the numerous examples of groups in the preceding section, the 
reader may possibly feel that it is difficult to avoid satisfying the axioms 
for a group. For greater clarity we shall now give some examples of sets 
with an operation that does not make the set into a group. 

1.3.1. Let N be the set of all natural numbers (excluding 0), and for 
n,meé N let 


v(n, m) = nm 


(with multiplication in the ordinary sense). Then (V), (A), and (N) are 
satisfied but no n ~ | has an inverse. 

1.3.2. For r, sé R, where R is the set of real numbers, let the operation 
consist of taking the maximum 


v(r, s) = max(r, 5). 


Then (V) and (A) are satisfied, but there is no neutral element and therefore 
the concept of an inverse remains undefined. 

1.3.3. Let I be the set of rational integers excluding the two numbers 
+2, —2. For g,heT let 


v(g,h)=gtrh 
(with addition in the ordinary sense). Then (A) is satisfied, 0 has the 
properties of a neutral element, —g is inverse to g and is contained in rT. 
but (V) is not satisfied since v (1, 1) ¢ I. 


1.3.4. Let N* bea set of all natural numbers including 0. For n, me N* 
set 


v(n,m) = |n — m| 


(where the vertical bars denote absolute value). Then (V) is satisfied, 0 has 
the properties of a neutral element, 7 is its own inverse, but (A) is not 
satisfied, since 
v(1, v(2, 3)) =|1—|2—3||=9, 
v(v(1, 2), 3) = ||1—2|—3| =2. 


2. Immediate Consequences of the Axioms for.a Group 


To simplify the notation, we shall henceforth write the group operation 
multiplicatively, except where otherwise noted. 
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2.1. As was proved in IB1, §1.3 for an additively written operation, it 
follows from the associative law (A) that the value of a product of more 
than three factors depends only on the order of the factors and not on the 
way in which they are combined in parentheses. For example, 


(CxCaM(CaCy) = Cy(Cx(C aC), 


and so forth. Thus we can omit the parentheses and for an ordered system 
of elements C, , ..., C, simply write 


CC rue ri 
Then 
(C, be Cr)(Crat eee C,) = C, ees CC ya fee Cc; . 

2.2. In (N) it was not assumed that a group has only one neutral ele- 
ment. But if N’ is also an element with the properties required by (N), then 
NN'= N, 

NN’ = N’, 


and therefore N = N’. Thus the neutral element in a group is uniquely 
defined; if the group is written multiplicatively, we call the neutral element 
a unit element 1; but if additively written, a zero element 0. 


2.3. Again, in (I) it was not required that an element Ge © should 
have only one inverse. But it follows from the other axioms that if G is an 
inverse of G and G is an inverse of G, then 


GGG =1-G 
=Gil, 
and so 
(1) G=G 
and 
(2) GG = G6 = 1. 


Now if G is also an inverse of G, we have GG = 1, and by multiplication 
on the left with G, 


GGG = G. 
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From (2) it follows that G = G, so that the inverse of G is uniquely 
determined; in the multiplicative notation we denote this inverse by G7}. 
Then by (1) and (2) we have 


(G7 =G, 


(3) 
GG = GG = 1, 


so that G is permutable with its inverse. In the additive notation we write 
—G in place of G-1. Then the rules (3) read 


G+(-G) =(-G+G=0. 
In place of G + (—H) we may also write the shorter form G — H, but 


then, in contrast to the situation for the group operation +, we must pay 
attention to parentheses. For example, in I+ (§1.2.1), 


(3—4)-543-—(4—5). 


It is easy to prove the following important rule for the formation of in- 
verses: 


(4) (G,.. GJ = G2... Gp. 
2.4 From the uniqueness of inverses it follows that a group © allows 


unique two-sided division. More precisely: if G, H are arbitrary elements of 
G, there exist uniquely determined elements X, Y € ©, for which 


GX = H, 
YG = H. 


(5) 


In fact, the elements Y = G-!Hand Y = HG" have the desired property: 


GG1H=1-H=H, 
HG1G=H:1=4H, 


and from GX, = GX,, and Y,G = Y,G, it follows after multiplication by 
G-! on the left and on the right, respectively, that G-1 GX, = G1 GX,, 
and Y,GG-1 = Y,GG~1, so that X, = X, and Y, = Y2, which proves 
the desired uniqueness. 

We now show that in the definition of a group we may replace the axioms 
(N) and (I) by two axioms symmetrically constructed with respect to 
multiplication on the left and on the right: 
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For every ordered pair (G, H), Ge ©, H € G we have 
(D,) an element X € © with GX = H, and 
(D,) an element Y €© with YG = H. 


Thus we must prove that (N) and (1) follow from (V), (A), (D,), and (D)). 
To this end, for a fixed G € G, we determine R by (D,) from the equation 


GR=G, ReG. 


If H is an arbitrary element of © and Y is determined by (D,) from the 
equation YG = H, then 


HR = (YG)R = Y(GR) = YG = H, 
and so 
HR = H forall HeG. 


In the same way, for the element Z € © determined by (D,) from LG = G, 
it follows that 


LH = H forall HeG. 


Thus, in particular, LR = L and LR = R, so that L = R = N is the 
neutral element. Then (I) follows at once from (D,), if we set H = N. 
Since we have already shown that (D,) and (D,) follow from (V), (A), (N), 
(I), we see that the two systems of axioms (V), (A), (N), (I), and (V), (A), 
(D,), (D,) are equivalent, as desired. 
For a finite group ©, the axioms (D,) and (D,) can be replaced by the 
following axioms of cancellation: 


(K,): If GX, — GX, , then xX, — Xo. 
(K,): If Y,G i: Y,G, then Y, = Yo. 


To prove this we consider, for an arbitrary but fixed G eG, the map- 
pings 
X—>GX, XeEG, 
Y— YG, YeG 


of G into G. By (K,) and (K,) these two mappings are one-to-one and thus, 
since © is of finite order, they are mappings onto G (see IBI, §1.5). 
Consequently, every element H € G has the form H = GX and H = YG, 
so that (D,) and (D,) are satisfied. Since we have already proved for arbi- 
trary groups that (K,) and (K,) are satisfied, it is clear that for a finite set 
> the system of axioms (V), (A), (N), () is equivalent to the system 
(V), (A), (K,), (K,). But, as is shown by the example in§1.3.1, this equivalence 
does not necessarily hold for infinite sets G. 
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3. Methods of Investigating the Structure of Groups 


The great importance of group theory is due to the fact that a few simple 
axioms give rise to a great wealth of theorems. Because of the simplicity 
and naturalness of its axioms, the theory of groups has penetrated deeply, 
as the above examples show, into many parts of mathematics, so that its 
theorems may be interpreted, according to the fields to which they are 
applied, as theorems about numbers, permutations, motions, residues, and 
so forth. 

Consequently, if we wish to describe these theorems in a natural way, 
we cannot remain satisfied with the meager vocabulary of the axioms. In 
the theory of numbers, for example, it is impossible to give any reasonably 
concise description of the results without introducing such terms as “‘divis- 
ible,” ‘prime number,” and so forth; and we must now turn to the con- 
struction of a corresponding set of instruments for the analysis of groups. 


3.1. Calculation with Complexes 

Let us consider a fixed group © and subsets &, including the empty set 
9, of its set of elements. Such subsets will be called complexes of G. As 
is customary in the theory of sets, we can give a precise description of a set 
by enclosing its elements in braces { }. For example, in the notation of §1.2.5 
we have 


(n) = {1, 2, ..., n}. 


If the elements of a set are defined by certain properties, these properties 
are written to the right of a vertical stroke; for example (see §1.2.3) 


{a| ae RX, a! = a} 


is the set of numbers in RX for which a~! = a; that is, the set {1, —1}. 

Since the complexes of a group are subsets of a set, we have already 
defined for them the set-theoretic concepts (see IA, §7.2) ‘equal’? =, 
“contained in” C, ‘‘properly contained in’: C, “intersection” © and 
“union” U; the last two are applicable to an arbitrary set K of complexes; 
for them we write ()pexK and Ugex8. 

If R and 2 are complexes of G, we define the complex-product K& as the 
complex consisting of all elements representable in the form KL with 
Ke&, Le: 


RE ={KL| KER, LEQ. 


For example, in §1.2.3 for G = PX and R = {1, }, 4}, 2 = {—2, , 4} we 
have 


KL => {—2, 4,4, —i, tr, —§, t, ts}; 
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and in §1.2.5, for G6 = S® and K = {1, B}, 2 = {1, y, a}, 
RL = {l, y, x, B, e}. 


If G is written additively, we write R + Lin place of RL. For G = I+ and 
KR = {g|\ gel, 2 divides g}, 2 = {g|geI+, 4 divides g} we thus have 
K+ 2 = {g|geTI*, 2 divides g}. 

This multiplication of the complexes of a fixed group is associative, as 
follows from the associativity of the operation of the group, and can thus 
be extended to more than two factors without use of parentheses. In 
general, multiplication of complexes is not commutative, as can be seen by 
calculating the elements of 28 in the above example for the group S®). 

By 8-1 we denote the complex consisting of the inverses of the elements 
of the complex 8: thus R-1 = {G-! | Ge K}. Then we have (R2)-1= 2-1-1 
by §2.3.(4). 

For unions of complexes the following distributive law holds for com- 
plex multiplication: 


(1) K(LU M) = KANUAKM, (LU M)K = LR UMA, 


but for intersections only a weaker form of distributivity is valid: 


(2) RKR(LAM) CKLV A KM, (QM) RKC LK 1 ML. 


But if 8 = G contains only one element, then here also we have the 
equality: 


(3) G(LAM) = GLAGM, (LAM) = LGN MG. 


The laws (1) and (2) will not be needed below; the proof of (3) is as 
follows: since H € G(2 A M) is equivalent to G-1H € 2AM Mi, it follows that 
He Gland He GM. 

For a given complex 8 the complexes of the form G-1R8G, Ge G are 
called the conjugates or transforms of & (under &). If & is conjugate only 
to itself, then & is said to be normal or invariant in ©. For example, | is 
normal in G. Furthermore, the whole group © is a normal complex, since 
for G eG we have: 


(4) 6G=G6=6, or G'6G=6, 


since the equations §2.4(5) have solutions for every pair G, H <6. Also, it 
is clear that 


(5) 66 = G, 
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and finally we note that since (G~!)-1 = G, we have: 
(6) G-1 = 6. 


3.2. Subgroups 

We now turn our attention to a concept, already used in the examples in 
§§1.2.6 and 1.2.7, which is of great importance in studying the structure of 
groups. If UW is a complex from a group G, it may happen that the complex 
YU forms a group with respect to the operation of group . Examples are 
provided by the set B, of motions on E, regarded as a complex from the 
group of all permutations on E, , and also by the set of motions which map 
a figure onto itself, regarded as a complex in the set of motions B,. We 
describe this situation by saying that UW is a subgroupof G, by which we mean 
that under the operation v defined for G the set YU satisfies axioms (V), (A), 
(N), and (I). Thus B, and B, , are subgroups of S¥2 and B,, respectively. 
The complex 8 = {1, —1} is not a subgroup of J+; it is true that & is a 
group with respect to multiplication of numbers, but that is not the opera- 
tion with respect to which I+ is defined as a group. The sets | and G are 
subgroups of every group &. Subgroups other than these trivial (improper) 
subgroups | and © are called proper subgroups. 

If U is a subgroup of G, the unit element | of © must be contained in U 
and must also be the unit element of Uf. For if UeU and 1, is the unit 
element of U, so that 1,, U = U, then this equation for !,, must also hold in 
© and, by the uniqueness of division in ©, its solution 1, = 1 is unique. 
Similarly, we can easily show that the inverse of U in UW is equal to its inverse 
U- in ©. 

In order to test whether a given complex is actually a subgroup it would 
be necessary, from a formal point of view, to examine all the four axioms 
for a group. But as is shown by the examples in §§1.2.6 and 1.2.7, this proc- 
ess can be shortened: for example, if the axiom (A) of associativity holds 
for G, then it certainly holds for the elements of a subset of G. We now 
prove the following criterion for subgroups, whereby the process can be 
still further shortened: 

A non-empty complex UC G is a subgroup of © if and only if 


(7) WU-! CU. 


The necessity of this condition (7) is clear at once from (5), (6), and the 
properties of inverses in WU. 

In order to prove that the criterion is also sufficient, we let 1f be a complex 
in G which satisfies (7). For arbitrary Ue U we then have 1 = UU-1eN, 
so that (N) is satisfied by the neutral element 1 in ©. Moreover, 
1U-! = U“eN, so that (XD) is satisfied. Also, for U,, U, €U, we have 
U,(U;")-! = U,U, € U, so that (V) is satisfied. Also, we have just seen that 
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(A) holds in U because it holds in G. Thus U is a subgroup of G, as was to 
be proved. 


3.3. By means of this criterion for subgroups it is easy to show that the 
following examples are subgroups: 


3.3.1. Complex of even numbers in "+. 

3.3.2. Complex {1, —1} in PX, RX, and Cx. 

3.3.3. Complex WU) = {1, «, B} in S® (§1.2.5). 

3.3.4. Complex S% of permutations in S™ (§1.2.5) that leave fixed a 
given point Pe. 


The reader will readily verify §3.3.1-3; and in order to show §3.3.4 we 
need only point out, in view of the criterion for subgroups, that if o and 7 
leave the point P fixed, then so does or: 


P(or-)) = (Po)r = Pro = (Pr)r = P. 


3.4. If Wis a subgroup of 6, the conjugate complexes G-1 G, G € G are 
also subgroups of ©. For by §2(4), §3(5), (6) we have 


G7UG(G2UG) = GAIUGGAUG = GUG, 


so that (7) is satisfied with G-44UG in place of U. 

If Wis a subgroup of Gand B is a subgroup of UW, then B is also a subgroup 
of ©, as follows immediately from the criterion for subgroups. Thus the 
property of being a subgroup is transitive. 

If Y is a set of subgroups of ©, the intersection D = ()y.q is also a 
subgroup of 6; for if U, , U, are in every Ue Y, then by (7) it follows that 
U,U;" is in every U € Y. On the other hand, the condition (7) is sufficient, 
and therefore D is a subgroup. 

Thus for every complex 8 CG there exists, as the intersection of all 
subgroups Uf 2 R, a smallest subgroup <> of G containing 8: 


>= 1) uw 


REU 


This subgroup is said to be generated by &, since <> consists of all the 
finite products Ry --- Kir, «e; = +1; in other words, of all the products 
that can be “‘generated”’ by the elements K, , ..., K, 8: for we see at once 
from (V) and (J) that all such products must lie in <R> and, on the other 
hand, the criterion for subgroups shows the complex of these elements is 
a subgroup of © containing 8. A complex & for which G = <8) is called 
a system of generators of ©, and G is said to be generated by S. If there 
exists a finite complex R = {E, , ..., E,} with this property, then G is said 
to have a finite system of generators, or to be finitely generated. The mini- 
mum number x of such generators is called the number of generators of ©. 
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The set of subgroups UW, B of a group forms a lattice (see IB9, 2.1) with 
respect to the operations (UW, B) > UO B and QU, B) > CU UB). This 
lattice provides us with an easily visualized method for studying the con- 
struction of a group. We give the graphs (also called diagrams; see IA, 
§9.5) of the lattices of subgroups for some of our examples: J+, §1.2.9 
(Figure 5); S®), §1.2.5 (Figure 6); By.g, §1.2.7 (Figure 7); 84-3}, §1.2.8 
(Figure 8). In Figure 7 the elements are written in the form of cycles; see 
§15.2.1. 

For many purposes it is preferable, for easier visualization, to consider 
only sublattices and their graphs; for example, any two subgroups U, B of 
a group 6 provide a finite sublattice with the graph represented in Figure 
9. In this case several of the subgroups may coincide. 

A subgroup U + G is said to be maximal in G if there exists no subgroup 
8B +6 of G containing WU as a proper subgroup. Similarly, a subgroup 
U + 1 of G is minimal if U has no proper subgroup B + 1. 


3.5. Residue Classes or Cosets 

If Uf is a subgroup of G, the complexes of the form UG and GU, GEG, are 
called right residue classes and left residue classes (or left cosets and right 
cosets), respectively. In this section we shall consider only right cosets. If 
HeNUG, then UG = WH, since H has the form H = UG, Ue, so that 
UH = GandUG = UA = UH, by (4). Thus: 

Either UG, = UG, or UG, AN UG, = 9 for arbitrary cosets UG,, UG,. Since 
every element of & lies in some right coset of UW (becauseG = 1 - GeUG)the 
right cosets generate a division into classes (see IA, §8.5) of the set G. The 
power of the set of right cosets is called the index of UW (under G) and is 
denoted by ©: UW. Since for Ue U the mapping U > UG of U onto NG is 
one-to-one because of the uniqueness of division, all the right cosets of Uf 
have the same power as UW. Thus, 


(8) |G | = G:w)- | Ui. 


For finite groups this fact was first proved by Lagrange. As a corollary, 
the order of a subgroup of a finite group © is always a factor of the order 
of G. 

The right cosets of the subgroup | consist of the individual elements of 
®. Thus in place of | G | we may also write © : 1. Then (8) takes the form 


G:1 = (G: WU: 1). 


Since the property of being a subgroup is transitive, it follows more gener- 
ally from (8) that for any sequence of subgroups 


YU, 2U,D... DU, 
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of © we have: 
WU, = Uy: Ua) > QU: U). 


The reader may verify these results in the graphs of Figures 5 through 9. 

A similar discussion of left cosets does not lead to any new results, since 
UG— G-Y isa one-to-one mapping of the set of right cosets onto the set of 
left cosets: every left coset is an image, and from G;"U = Gz1U it follows by 
taking inverses that UG, = WG,. The natural assumption that the set of 
right cosets is the same as the set of left cosets is false. Consider, for exam- 
ple, S (§1.2.5) and 


u=l( 2 sel 3 =n. 


In this case the set of right cosets is 


{{l, v}, {a e}, (8, 33}, 


and the set of left cosets is 


{{1, vy}, {a 8}, {B, <}}. 
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Subgroups for which these two sets of cosets are the same (as would 
always be the case, for example, with Abelian groups) are extremely im- 
portant, and we shall pay a good deal of attention to them later; they are 
called normal subgroups (see §6). 


4. Isomorphisms 


If we wish to illustrate the addition of numbers, it makes absolutely no 
difference what objects (fingers, apples) we use. Obviously, the important 
element in the situation is not what we add, but how. This phenomenon, 
which also occurs in group theory and elsewhere, will now be described as 
Clearly as possible and given a mathematical formulation. 


4.1. Itis obvious that the operation of a group is completely described 
by the following table: 


|G A TT «= 
G|GG GH GI 
H| HG HH HI 
I 


(IG TH I 


The elements of G are entered in the left-hand column and in the top row 
and then the product GH is entered at the intersection of the G row and the 
H column. This table is called the multiplication table for ©. 


2 Groups 189 


For some of our earlier examples it will have the form: 


Pe+ B" m = {m} 
[0 1 | @m 
olo. 019 m 
111 0 mim 
pax (3) 
et 5 Nat 36h BS gy Sis te 
Vit 3S: Ee BR pee 
3;3 1 7 5 aja B 1 6 € y 
SS. 2 ES FB BD cae 0 
fee eee ee rae | yiy € 6 1B « 
56|5 y e€ aw 1 B 
eie 5 y B aw 1 
I+ 
) 0 123 -1 -2 -3 
0 0 123 ". -—1 —2 -3 
I 1 23 4: 0 -—1l -—2 
2 2 3 45: 1 0 -!1 
3 3 456: 2 1 0 
-1/-1 O12 —2 -3 -4 
a St OW ie 23) Sa 5 
—3;-—-3 —2 10°: —4 —-5 —-6 


The two-sidedness and uniqueness of division means that every element of 
the group will occur exactly once in each column and in each row. The 
reader should consider how the axioms (N) and (J), and such properties as 
the commutativity of multiplication, are reflected in these tables. 

If we look again at the multiplication table for the group I"®)*, it is 
obvious that the operation of the group will be equally well described if 
we replace the Arabic numerals by Roman numerals, so that the table 
becomes: 


| Il V_ VII 


Soo od ee 
I I YW Vv Vil 
Il ' WI Vl Vv 
Viv vw J IW 
VI: VI V Wt I 
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If in the example for 8™ we had replaced 9 and m by the numbers 0 and 1, 
we would have obtained the multiplication table for our example [')+. 
The phenomenon arising in this way from the renaming of elements can be 
described mathematically as follows. 


4.2. If G is a group and Q is a one-to-one mapping of G onto a group 
§ such that the image of each product is equal to the product of the images, 
or in other words if 


(1) (G.G,.4= GG? forall G,,G,€6, 


then A is said to be an isomorphism of © onto §. If such an isomorphism 
exists, we say that © and § are isomorphic, or that they are of the same 
type, or the same structure, and write G = §. 

As an exercise, the reader may set up an isomorphism between the group 
YP (§1.2.4) and S®), 

If G and § are isomorphic, they obviously have the same order, and from 
the definition it is clear that an isomorphism X also has the following 
properties: 

4.2.1. The image of the unit element is the unit element. 

4.2.2. If G-1 is the inverse of G eG, then (G-!)* = (G’)-1, which we 
abbreviate to G-". 

4.2.3. The image of a subgroup is a subgroup. 

4.2.4. The image of a normal complex is normal. 

4.2.5. IfGis Abelian, then every group isomorphic to G is also Abelian. 

The reader will readily prove §4.2.2, 3, 5; as examples let us show: 

4.2.1. Since 14-14 = (1 - 1) = I, it follows that 1’ is the unit ele- 
ment of §. 

4.2.4. Since every element of § is an image under the isomorphism, 
the statement follows from the fact that GR’G-4 = (GRG-1) for all 
H=GQE. 

An isomorphism determines a partition into Sissies: that is, it is a re- 
flexive, symmetric, and transitive relation:’ the identical mapping of © onto 
© is obviously an isomorphism, so that we may write G ~ 6. If A is an iso- 
morphism of G onto §, the inverse mapping A-! is an isomorphism of 9 
onto ©; for from (1) we have 


(2) (GiG3)** = [(G,G,)"" = GG, = (G)*"(G2)". 


Also, if 2 is an isomorphism of § onto 3, then the mapping Ay is one-to-one 
and is an isomorphism of © onto 3: 


(3) (G,G,)™ = ((G,G,)"" = (GiG)* = GeGy. 


7 The partition into classes also determines the equivalence relation; see IA, §8.5. 
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Thus, 6 ~ § and § ~ 3 imply © = IJ, so that the set of all groups falls 
into classes of isomorphic groups such that no group belongs to more than 
one class. 

Now the fundamental problem, the so-called type or structure problem, 
of the theory of groups is to select, from each class of isomorphic groups, 
a representative, or model, which is to be described as precisely as possible. 
What is meant here by ‘“‘as precisely as possible” is to a great extent a 
matter of taste. In general, we shall look for a description in terms of con- 
cepts that lie closest to our intuition; for example, numbers, diagrams, and 
so forth. If we can find a model of this sort in every class, we have given a 
complete description of all groups, since any group arises from such a 
model by a mere renaming of the elements. At the present time we are still 
far from such a goal. Only for certain rather small, special classes of 
groups, e.g., the finite Abelian groups and a few others, has a satisfactory 
solution of this problem been found. In the next section we shall carry it 
through for a very simple class of groups, namely, the cyclic groups. 

If we wish to give a complete account of the isomorphisms A of 6 onto 
§, it is enough to consider the case § = 6. For if A, , A, are two isomor- 
phisms of G onto §, the mapping « = A,A,~1 is an isomorphism of & onto 
itself. On the other hand, if « is an isomorphism of © onto itself, and A, is 
an isomorphism of & onto §, then aA, = Az, is also an isomorphism of © 
onto §. Thus two isomorphisms of © onto § differ from each other only 
by an isomorphism of G onto itself. An isomorphism of © onto itself is 
called an automorphism of ©. If «, B are two automorphisms of G, the 
mapping af-! is also an automorphism of 6, as is easily seen from (2) and 
(3). Since automorphisms are combined in the same way as permutations, 
the set of automorphisms of G is a subset of S®, the so-called group of 
automorphisms of ©. 

For example, [°)+ has two automorphisms: 


'=() 1 a) == (5 2 a) 


Thus for any group © isomorphic to J"*)+ there exist two isomorphisms 
of [+ onto 6. 


5. Cyclic Groups 


Corresponding to any element G of a group there exists the subgroup 
<G) (§3.4). The structure of such groups is particularly simple. In general, 
a group of the form © = <G) is called cyclic. For example, every group © 
of prime order p is cyclic, since for G « G the theorem of Lagrange §3 (8) 
shows that <G> : 1 must be equal either to 1 or to p, so that for G ~ 1 we 
have <G> = ©. Let us examine the cyclic groups a little more closely. 
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5.1. For an arbitrary group © and G e © we define the nth power of G 
as the product 


G-G.-... G, with G written n times, 
and denote it by G”. Obviously we have 
G"(G"1)" — 1, 


so that (G”)-1 = (G~4)”; and thus we can write G-” for the inverse of G”. 
Finally we set G° = 1. Then for every rational integer g the gth power G® 
is uniquely defined. Here 


(G9)* = Grr 
and 
(1) G9Gh = Goth, 


as follows® at once from the definition for the various cases g > 0,h > 0; 
g<Oh>0;g>0,4<0;¢<0,4<0. 

By the criterion for subgroups, it follows from (1) that the set of powers 
of G €G is a subgroup, which is obviously equal to <G). 


5.2. As a special case of the fundamental problem for the theory of 
groups we now reduce the description of the structure of cyclic groups to 
calculation with rational integers. In other words, we prove the following 
fundamental theorem for cyclic groups. 

Let the © = <G) be cyclic. If © is of finite order n, then © ~ P+, 
If G is of infinite order, then© ~ I+. 

In particular, we have the following corollaries. Every cyclic group is 
Abelian. For every finite order there exists a cyclic group which is uniquely 
determined up to isomorphism. There are no cyclic groups of power higher 
than Xo. 

For the proof, we consider the various powers of G. By the remark at 
the end of §5.1 we have G = {G? | g = 0, +],...}. If G21 4 G*%, for all 
2,82, the mapping g— G” is one-to-one and is therefore, by (1), an 
isomorphism of J+ onto G; thus in this case the order of G is equal to that 
of I+. On the other hand, if G71 = G%, g, < g., so that G71-92 = 1, let 
n > 0 be the smallest positive integer with the property that G” = 1. Let 
g be a rational integer with g=nh+r, O<r<n, so that 
G9 = Grr — (G")* Gr = G". From G’ = G’,0 <r<s <x it follows 
that G*-* = 1, so that s = r by the minimal property of n. Thus r — G’ is 
a one-to-one mapping of I“™)+ onto G. Then by (1) this mapping is an 


8 See also IB1, §3.3. 


2 Groups 193 


isomorphism of J“”)+ onto G, which completes the proof of the fundamen- 
tal theorem for cyclic groups. 


5.3. For an arbitrary group © and Ge G the number | <G) |, which 
has been defined above as the order of the group <G), is also called the 
order of the element G. From the proof in §5.2 it follows that the order of 
G, provided it is finite, can be characterized as the smallest natural number 
n for which G” = 1. The same proof shows that divisibility of g by | <G) | 
is equivalent to G2 = 1. Since the order of G is equal to the order of the 
subgroup <G>, the theorem of Lagrange shows that the order of G is a 
factor of the order of ©. The least common multiple of the orders of all the 
elements of G, provided it exists, is called the exponent of ©. Thus the 
exponent of a group © is the smallest natural number e for which G? = 
for all GEG. 

For example, the exponent of S‘*) is 6. A group with the exponent 2 is 
Abelian: for in general we have (§2 (4)) (G,G,)-! = GzG{', so that if 
G? = 1 and therefore G-! = G for all G € 6, it follows for all such G that 
G,G, = G,G,. 


5.4. We now examine the subgroups U of a cyclic group <G). To this 
end we choose G? in such a way that d is the least positive integer with the 
property G4 € U, d > 1, as is always possible, since the subgroup U contains 
the inverse of every element in U. Then UW contains the powers (G*)?; a = 0, 
+1, .... But these powers already exhaust all the elements of UW; for if we 
had Ge, g=hd+r, 0<r<d, it would follow that GG")! = 
G’ €U, in contradiction to the choice of d. Thus Wis cyclic of the form <G%). 

It remains to decide when <G%1) = <G*) with 1 < d,, d,. A necessary 
and sufficient condition is obviously the existence of rational integers 
a, b with 


Gi = Gade, 
G4 — Gra; 
that is, 
(2) d, = ad,, d, = bd, , if | <G) | is infinite, 
(3) d, = ad, , d, = bd, mod a, if | <G> | = nis finite. 


Now (2) is equivalent to d, = abd,, so that ab =1 and d,=d, ; and 
(3) is equivalent, by IB6, §4.1, to 


n|d, — ad,,n|d, — bd,. 
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But these conditions of divisibility can hold if and only if 
(n, da) | dy, (n, dy) | de 
or 
(n, d,) | (n, dz), (n, dg) | (n, dy) 
or 


(n, dy) = (1, 42). 


We thus obtain the following result: 

The subgroups of a cyclic group <G)» are cyclic of the form (G4, 1 <d. 
Under the mapping d — <G*) they are in one-to-one correspondence with the 
natural numbers d = |, 2, ..., if | <G) | is infinite and to the positive divisors 
d of n if | <G) | = nis finite. 


Here d is the index of the subgroup WU = <G, since for every G4 € <G) 
there exists exactly one r,0 <r <dwithg = hd+r; thus 


G® = (G%)'Gr EUG’, 


and the complexes UG" generate each coset of Uf exactly once. 
As an application of these results the reader may prove the following 
theorem: every minimal subgroup is of prime order. 


6. Normal Subgroups and Factor Groups 


6.1. In §3.5 we saw that a right coset for a subgroup of © is not neces- 
sarily a left coset. We now examine those subgroups of a group for which 
every right coset is also a left coset, a property which, though at first glance 
it seems insignificant, has several very important consequences. Such 
subgroups St are given the special name of normal subgroups or, correspond- 
ing to the terminology of §3.1, invariant subgroups. By §3.5 they are char- 
acterized by the fact that 


(1) NG = GM or GRNG-1 = N forall GeG. 


The second identity means that 3 coincides with all its conjugates in ©. 


6.2. In an Abelian group every subgroup is a normal subgroup, as we 
have already seen. But the Abelian groups are not the only groups with 
this property. There exist non-Abelian groups in which every subgroup is 
invariant. For such groups, called Hamiltonian groups, the type problem 
has been satisfactorily solved. We shall content ourselves here with an 
example of order 8, the so-called quaternion group Q. Its elements are the 
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following matrices with coefficients from the field of complex numbers, and 
its operation is matrix multiplication (cf. IB8, §3.1): 


=(0 rh = ( 1k J=G oh = (G oO) 


(9 8 HO hd 


The lattice of subgroups of Q has the form shown in Figure 10. 

Ifa subgroup U of G has the index 2, 
it is a normal subgroup, since in this 
case the coset distinct from UW contains 
exactly those elements of G that are not 
in Wf. For example, the subgroup 
{1, a, B} (§3.3.3) is a normal subgroup 
of S®@), But the fact that a subgroup 
of index 3 need not be a normal sub- 
group is shown by the example U = 
{1, y} CG™. 


6.3. The fundamental role of nor- 
mal subgroups is indicated by the 
following theorem: 


A subgroup of © is normal in © if and only if the set of its right cosets 
(or alternatively, of its a cosets) is a group with respect to complex 
multiplication. 


For the proof we first let It be a normal subgroup of ©. From (1) it then 
follows that 


(2) NGNH = NRNGH = NGH, 


so that (V) is satisfied. The associativity of complex multiplication was 
already shown in §3.1. The neutral element is St: 


NGN = MNG = NG; 


the inverse of MG is NG-}1, since NGNG-! = MM = MN. Thus the cosets of 
M form a group. 

Conversely, let 3% be a subgroup of G whose right cosets form a group, 
so that in particular the product NGMNG-! is a right coset for all GEG. 
Since 1 = 1G1G-1 EMGNG-1, we have by §3.5 


NGNG43=N-1=MN, 
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and thus 
GNG = 1-GMRNG-ICMR, 
and 
NC GING; 


consequently, 3 = GNG=!, since if G runs through all elements of G, so 
does G-!. Thus St is a normal subgroup of G. 

Similarly we could have shown that % is a normal subgroup if an only if 
its left cosets form a group. 

The group of cosets of % is denoted by G/N and is called the factor group 
of N (under G) or the factor group G byN. The order of G/N is equal to the 
index © : N. 


6.4. In an Abelian group every factor group is also Abelian. If 
G = <G) is cyclic, every factor group <G>/<G*% is also cyclic, namely 
equal to <<G4)G). 

If Nis a normal subgroup with N + 1, the group G is split by MN into two 
groups G/N and Jt. These groups are usually simpler in structure than © 
itself (for example, in the finite case they are of smaller order) so that by 
examining them separately we can gain insight into the structure of . 
Especially interesting from this point of view are the groups that do not 
allow any splitting in this sense; in other words, groups which have no 
normal subgroups except 1 and G. Such groups are called simple. Simple 
Abelian groups are of prime order. Conversely, every group of prime order is 
simple and Abelian. As we shall see later, there exist simple groups (some- 
times called properly simple) that are not Abelian. Their existence offers a 
serious obstacle to the solution of the structure problem. 

It is easy to show that the intersection ot a set of normal subgroups is 
again a normal subgroup. 

If N is a normal subgroup and U is any subgroup of G, the complex Mu 
is a subgroup of 6 and is obviously equal to the group UU generated by 
<M UU. For with N,, N, EM; U,, Uz, €U it follows from (1) that 

U(N,U,)* = N,U,U,z'N," 


1712 
= AU. 1 U3") NZ(U,Uz)7U,U;* 
e MU, 


so that the criterion for subgroups is satisfied for MU. In particular, if 
YU = Mt is also a normal subgroup of G, then 


GMMG-! = GRGIGMG = RM, 
so that MM is a normal subgroup of G. 
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A normal subgroup %t of G is also a normal subgroup of every subgroup 
U CG that is contained in 9. The subgroups of Uf that contain M are in 
one-to-one correspondence under the mapping 


(3) U + U/RN 


with the subgroups U/M of G/M. More precisely, we have here a lattice 
isomorphism (see IB9, §2.1) of the lattice of subgroups of © containing N 
onto the lattice of subgroups of G/I. Moreover, the normal subgroups of 
the two groups are in one-to-one correspondence. 

If Gis generated by K, then G/N is generated by {MK | K € K}; for if NG 
is a coset of Nand G = Kj --- K*", K; EK, then MG can be represented as 


NG = NK... Ke 
= (NK) ... (MK, 


7. The Commutator Group 


For any given group it is possible, by using the concept of a factor 
group, to define a certain subgroup which, roughly speaking, measures 
the extent to which the group departs from being Abelian. For this 
purpose we consider a set Jt of normal subgroups 93 of G whose factor 
groups are all Abelian. Let D be the intersection of all the normal 
subgroups in Jt, so that D is a normal subgroup of ©. We will now show 
that G/D is also Abelian: for we have NGNH = NANG, that is, 
NGH = NHG or NGHG-'HA-! = MN for all MeN and G, He G. Thus 
every element GHG"! lies in all the Ste M: 


GHG"'H-'eD andthus DGHG'H" = D. 


Consequently, 
DGH = DHG, 


DGDH = DHDG, 


so that G/D is commutative. If I is the set of all normal subgroups 
of G with Abelian factor group, it follows that: 

In every group there exists a normal subgroup, with Abelian factor group, 
which is contained in every normal subgroup with Abelian factor group and 
is the intersection of all such normal subgroups. It is called the commutator 
group of & and is denoted by 6’. 

It is easy to see that the commutator group is generated by the set of 
all elements GHG-1H-!, the so-called commutators of 6. To say that 
6’ = | is equivalent to saying that © is Abelian. In the other extreme 
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case, © = G’, the group G is called perfect. A properly simple group is 
perfect, since 1 and © are its only normal subgroups and therefore 
© = G’. The commutator group of S®) is the subgroup 2%‘) (§3.3.3); 
the factor group S®/Q) is Abelian and 1 is the only proper subgroup 
of 2); however, 6/1 ~ S® is not Abelian. 


8. Direct Products 
8.1. Let Mand M be normal subgroups of ©, with 


(1) MR = G, MAR = 1. 


If M,N, = MLN, , M, 9 MLEM, Ni, N2ER, then M;'M, = N,N;z? = 1 
and M, = M,,N, = N,. Thus if the elements G e« © are represented 
in the form 


(2) G = MN; MeM NEN, 


the M, N are uniquely determined by the G. 
Now consider a product MN = NMM-1N-!MN. Since M and M are 
normal subgroups, we have 


(M3N-1M) N = M7(N2MN)EMAN = 1 


and thus 
M3No3M EN, N-=1MN © M, 


so that every element of 9 permutes with every element of J. 
Thus the operation of the group © is completely determined by the 
products of elements from IN and M: 


(3) (M,N,)(M2N2) = M,M,: N,N2. 


We describe this special case by saying that © is decomposable into the 
direct product of M and Nt. It is customary to write (1) in the shorter form: 
G=Mx MN. 

The mapping o, with (MN)* = N, is an isomorphism of G/M onto M: 


G/M =~ N. 


For by (1) the requirement §4(1) is satisfied, and the mapping ¢ is onto 
and is one-to-one, since every coset of I contains its image. 


8.2, More generally, we say that © is the direct product of M;,; 
i= l,...,r, and write 


G=M, x... xX M,, 
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if the IN; are normal subgroups of 6, with M;M, = M,M, for M,;eM,, 
M,€M,,i + k, and if every element G € © can be written as the product 


(4) G = M,... M, 


with uniquely determined M,eM;,. This extension of our notation is 
justified by the fact that the direct product as now defined can be obtained 
by iteration from the earlier definition. For if 


G=Mx RN N= Lx K, 
we have 


G=MxlxsK 


in the generalized notation. 
From the uniqueness of the representation (4) it follows that the order 
of a direct product is given by 


|G | = | My]... | Me I. 


If G is the direct product of Abelian groups, then G itself is Abelian, 
as follows directly from (3). Every group can be decomposed into the 
direct product © = 1 x G. A group which allows no other decomposition 
into a direct product is said to be indecomposable. For example, the group 
S‘) is indecomposable, since it has only one normal subgroup +1. 


8.3. An exact knowledge of the indecomposable groups would be very 
desirable; in a certain sense they represent an analogue in the theory of 
groups to prime numbers in the theory of numbers. For a very extensive 
class of groups, which includes for example the finite groups and the 
finitely generated Abelian groups, we have the following theorem, in 
analogy with unique decomposition into prime numbers. 

If 

M, x... X Mee Wx... «x N,; 


with M,N, indecomposable; i = 1, ...,r;k = 1,..., 8, thenr = s, and there 
exists a permutation o of the indices such that M,; ~ Ni - 

In the following section we shall determine the indecomposable groups 
in the class of all finite Abelian groups, and the direct product will provide 
us with a suitable solution of the type problem for this class of groups. 


9. Abelian Groups 


In §9.1 we state certain elementary but basic properties of Abelian 
groups which will be useful in what follows. 
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9.1. As mentioned before, every subgroup of an Abelian group © 
is a normal subgroup, and for two such groups <U U B> = UB. Also, in 
Abelian groups we have the following power rule: 


(GHyY =GH..GH=G"H";  G, HeG. 


From this rule it follows that for every g the complex G7 = {G" | G ec G} 
is a subgroup of ©: G7H-* = (GH-1)9; and dually, for fixed g, the set 
of elements G € © for which G? = 1 is a subgroup §6,: for if G, , G, € G, 
then (G,G3° = G{G;’ = 1. 

If G, H € G are of finite order and G? = 1, H”® = 1, then (GH-)* = |. 
Thus the elements of finite order in © form a subgroup T, the 
so-called torsion group of ©. For example, the torsion group of P* (§1.2.3) 
is the subgroup {1, —1}. The torsion group 7 itself is not necessarily of 
finite order, as is shown by the example P+/I+ (§1.2.2): for if g/he Pt, 
g,he I+, then for A summands we have 


(T+ + z) $$ (Pe + ‘) —[++g=T*. ptr, 
so that P+/I+ is identical with its torsion group. But P*+/I"* is not of finite 
order, since the elements 1, 4, 4, ... lie in distinct cosets of I+. 

Abelian groups in which only the unit element is of finite order are 
said to be torsion-free. Thus the cyclic group of infinite order is torsion- 
free, and so are all direct products of torsion-free groups. Furthermore, 
for an arbitrary Abelian group the factor group by its torsion group is torsion- 
free: for if (TG)™ = T, then G" = Te J; and if 7' = 1, then G™ = 1, 
so that Ge Z. Thus TZ is the only element of G/Z that is of finite order. 


9.2. Finite and Finitely Generated Abelian Groups 

The purpose of the present section is as follows: by making use of the 
direct product we reduce the type problem for finite Abelian groups to 
the same problem for cyclic groups, for which it was already solved in 
§5.2. In speaking of direct products we include the case that the 
product has only one factor. 

9.2.1. Wenow prove the fundamental theorem on finite Abelian groups: 
a finite Abelian group © is the direct product of cyclic groups of prime power 
order: 


(1) G = <G.> x... X GG), KGp| = bp, Dp, = prime. 


To prove this theorem we first prove the following two lemmas: 


9.2.2. If the exponent of a finite Abelian group is not a prime power, 
the group is decomposable. 


2 Groups 201 


9.2.3. If the exponent of a finite Abelian group © isa prime power and 
if M is an element of maximal order in G, then there exists a direct decom- 
position 

6 = M x G 
with a suitable subgroup 6 C G. 

From these two lemmas it is clear that an Abelian group which is not 

cyclic of prime power exponent must allow a proper decomposition 


G=Hx3s /1H1L3|<|Gl. 
Since by the theorems on cyclic groups the exponent of such a group is 
equal to the order of the group, the fundamental theorem now follows 
immediately by complete induction on the order of the group and iterated 
construction of the direct product. 

Proof of §9.2.2. We assume that the exponent e of © is the product of 
mutually prime positive integers 41: 

e = ab, (a, b) = 1, 
and then show that 
(2) G = G* x G’. 
For this purpose we determine a, 5 such that ad+ bb = 1. Then 
G = (G2)*(G») for all Ge G, and thus 
6 = G6’. 
Let Gf = G3 ec G*N GB; G,, G, « G. Then 
(GY) = (G3), 
Gi-vb = G4, 
G, = (G%G5). 
Since e = ab, it follows that G? = (G3G5)” = 1 and 640 G = 1, 
so that we are dealing here with a direct product (2). But a and 6 are 1, 
so that by definition of the exponent 6°, G? ~ 1. 

Proof of §9.2.3. If G is cyclic, there is nothing to prove. If not, we 
first construct a subgroup § ~ 1 whose intersection with <M)» is 1. 
To do this, we let the exponent of © be a power of the prime p. The 
theorems on the subgroups of cyclic groups show that for G¢ <M), 

(GY ON KM) = (GY) = (MP, 
so that 
p®* |KG?*Y| = |G)I, 
p’: |<M™| = |KM)| 
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with, let us say, G?’ = M2", By the maximality of M we then have 
s<r. For the element H = GM-*"", which is different from 1, 
it follows that H" € <M) is equivalent to G” € M, so that 


CH) 0 <M) = (H™). 
But 

H” = G?*M-+? — |, 
and therefore 


<H) 0 <M) = 1. 


Consequently, § = <H) has the desired property. 

The factor group ©/§ is also of prime power exponent, so that each of 
its elements is of prime power order. Also, §M is of maximal order in 
G/; for if (65M)” = $, (9M*)" ~ $, M* eG, it would follow that 
M*' & §, so that by the construction of § 


M?' = 1 
and 


M*"' ¢§, 


so that M*? +1 and M* would be of greater order than M, in contra- 
diction to the choice of M. But now the fundamental theorem follows 
immediately by complete induction on the order of ©, since the theorem 
holds trivially for |G | = 1, and since G/ is of smaller order than 6, 
there exists a direct decomposition 


G/H = <HM)/H x /H 


with a subgroup 6 of G. In other words, <§M)6 = 6, (HM) NG = §. 

But from <§M> = §<M> and § CG we then have <M) 6 = G and 

<M> 6 C §. Thus it follows from the choice of § that (M) 6 = 1. 
Thus the proof of the fundamental theorem is complete. 


9.2.4. The fundamental theorem provides the complete decomposition 
of a given group, as can be seen from the following fact: every cyclic 
group <G> of prime power order p* is indecomposable. For if <G> were 
decomposable, each of the factors would contain a minimal subgroup of 
order p, in contradiction to the fact that a cyclic group cannot have more 
than one subgroup of a given order. 


9.2.5. The cyclic groups allow us to solve the type problem for a more 
general class of groups, in view of the theorem: 
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If G is a finitely generated Abelian group, then © is a direct product of 
finite cyclic groups, whose orders are prime powers, and of infinite cyclic 
groups: 


(3) G© = <Gp X ... X (GD) X (HD xX ... X CHS; 
KG,>| = px, p, prime, i= 1,..,7; \(HD| = 8, k= I,..,5. 


It is obvious that the torsion group of © is given by 


0 x ie <G,>, 
which is finite. 

We shall omit the proof of this more general theorem, since it does not 
require the introduction of any essentially new ideas. The chief difficulty 
lies in replacing the complete induction on the order of the group by 
complete induction on the number of generators, which may become 
laborious, since it is not always easy to see whether a proper subgroup 
has a smaller number of generators than the group itself. 


9.2.6. By means of the theorem on the subgroups of a cyclic group it 
is easy to prove that an infinite cyclic group is indecomposable. Then the 
theorem in §8.3 on the uniqueness of direct decompositions states that 
for finitely generated Abelian groups the numbers p*}, ..., p¢r and s in (3) 
are uniquely determined by ©. On account of their importance in com- 
binatorial topology, these numbers pj},..., pfr are called the torsion 
numbers of ©, and s is the Betti number of G. 


9.2.7. From the decomposition (1) it is clear that we can determine 
the exponent e of © as follows: for every prime p that divides © : 1 we 
determine the group <G,> of highest p-power order p*; then 
(4) e= [| ». 

p|G:1 

9.2.8. Examples. The reader may consider for himself how the general 
finite cyclic groups are included in the preceding theorems and how they 
are to be decomposed into direct products of cyclic groups of prime power 
order. 

The expression (4) for the exponent enables us to give a complete 
description of the structure of finite groups with exponent 2. As pointed 
out in §5.3, these groups are Abelian, so that they must be direct products 
of cyclic groups of order 2. Thus, for example, the type of all groups 8™ 
with finite | m | has been satisfactorily described. 

As a further example of the preceding theorems we shall show how they 
apply to the group §1.2.11. The torsion group T of €@ consists of the 
elements 1, —1,since these are the only real numbers for which some power 
is equal to 1. If we denote by €® the set of positive numbers in E®, 
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then €® is obviously a subgroup of E®). Since every number in ©® can 
be written in the form 1 -« or (—1) - « with an element « c © and since 
TA E = 1, we have the direct product decomposition €® = T x E®, 

Let us investigate the group ©), 
To every real number g + 4/3 
with integral g, h we assign the 
lattice point (g, 4) in a Cartesian 
coordinate system. Then the ele- 
ments of © correspond to the 
lattice points of the hyperbola x? — 
3y? = 1. We now consider the set 
g of parallel lines 


xt+yV3=p, preal >I. 


As can be seen at once from 
Figure 11, g consists of the lines 
that are parallel to the line x + y ~/3 = 1 and lie above it to the right. 
Thus there exists a line x + y+/3 = eg ing, with e, > 1, that passes through 
a lattice point of the hyperbola and has the minimal e, for all such lines. 
Then e, is an element of &*, and we can show that the elements ¢ « G® 
are the powers of €). For let us choose an integer m such that 


Fig. 11 


m m+1 
€ ecg : 


Then 


L< —_ < €. 
£0 
Now, ¢/eg = € is an element of G3), so that the line x + yV3=€é 
passes through a lattice point of the hyperbola. From the minimal property 
of e, it follows that « = 1 and ef’ = e. Since the powers e” form an 
infinite set of numbers, the group €®) is a cyclic group of infinite order. 


9.3. Group Properties of Ornaments 


We shall now give an example to show how the symmetry of ornaments 
can be analyzed by means of group concepts. The characteristic feature 
of the ornaments of interest to us is “‘infinite repetition,” by which we 
mean the repetition of a definite pattern, the ‘‘elementary ornament,” 
at equal distances. This repetition, or translation, can take place in one, 
two, or three noncoplanar directions, with the corresponding classification 
of ornaments into linear (that is, occurring on a band or strip), planar, 
and spatial types, the last two of which play an important role in crystal- 
lography. Whether there exist other symmetry operations, in addition to 
these pure translations, depends on the nature of the given ornament. 
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In illustration of these remarks let us consider the classical decoration 
shown in Figure 12. If we interpret this ornamental strip as a sequence of 
partially overlapping discs situated in three-dimensional space, then the 


Fig. 12 


strip can be mapped onto itself by the following, simple or composite, 
operations: 


Translations by distances that are multiples of a certain elementary vector e 
in the direction of the axis of the ornament. 

Reflection in the plane perpendicular to the strip and through the axis, 
with or without simultaneous translation by a multiple of the 
elementary vector (longitudinal reflections). 

Reflections in planes that are perpendicular to the axis of the ornament 
and are at a distance from one another of half the elementary 
distance e (transverse reflections; in the figure the traces of these 
planes are marked by dots and dashes). 

Rotations by 180° about transverse axes in the plane of the strip, where 
the axes are again at a distance from one another of half the 
elementary distance (half-turns around a transverse axis; in the 
figure the axes are marked by dashes). 

Rotations around centers at a distance from one another of half the 
elementary distance (in the figure these centers are indicated by small 
circles). 

Translations by an odd multiple of half the elementary vector with 
simultaneous reflection in the plane of the ornament, so that the 
overlappings of the circles are reversed (planar glide reflections). 

Translations by an odd multiple of half the elementary vector with 
simultaneous rotation of 180° around the longitudinal axis (spiral 
motions). 
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Rotations around points separated by half the elementary distance 
(marked in the figure by small black circles) with simultaneous 
reflection in the plane of the ornament (rotatory reflections). 


The schematic representation in Figure 13 shows the structure of the 
ornament. 


It is obvious that these mappings are the elements of an infinite non- 
Abelian group. We divide them into the complexes listed in the above 
classification; among these complexes the translations, including the 
identity, clearly form a normal subgroup. We can gain insight into the 
structure of the ornament by forming the factor group with respect to 
translations. 

In order to investigate this factor group, we make use again of the above 
schema, consisting of overlapping triangles with the same structure as 
the ornament. We consider each elementary part of the ornament as being 
divided into eight fields, each of which contains a figure consisting of two 
overlapping triangles; this figure can be mapped onto the other figures by 
an operation of the group, indicated below by primes. 


1. The translations 

BO :1 +1 (the identity mapping), 

BM v1 > 1 (translation by é), 

Bela" (translation by —e), and so forth 


form the normal subgroup and the unit element of the corresponding 
factor group is 8, = {Bj”) | n = 0, 1, 2, ...}. 
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2. The mappings 

BO: 1 — 2, 

BY 21+ 2’, 

Be): 1 + 2’, and so forth 


represent half-turns. All these motions arise from a single B, by 
complex-multiplication with the group of translations: ®, = 
{Bi — BB, . 


3. The mappings 

BO; 1 > 3, 

BO 1 3, 

B®): 1 + 3’, and so forth 


characterize the complex of planar glide reflections. This complex 
arises from one element B, as follows: 8, = {BS} = B,B,. 


4. The mappings 

BO :1 +4, 

BY 14, 

BP 1> 4, and so forth 


are reflections in transverse axes; they may be written as cosets of 8, 
with arbitrary B,: 8, = {B;”} = B,B,. 


5. Reflections and glide reflections in the longitudinal axis 
Boel 5, 

BY 1 5, 

BOT les, and so forth 


are elements of a complex which may be represented by 8, = 
{By”} = BB. 


6. From 
BY): 1 6, 
BY :1-+ 6, 


Be): 1 > 6’, and so forth 


we obtain the rotatory reflections. Here 8B, = {Bf} = B,B,. 
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7. The helical motions 
BO: 1 +7, 

BY les, 

BOS eT and so forth 


form the complex 8, = {B/”} = B,B,. 


8. Finally, the simple rotations around centers 
BO: 1 8, 

BO: 1 + 8, 

Bie): 1 — 8, and so forth 


may be represented in the form 8, = {B{”} = B,B, . 


The symmetry of the ornament is now characterized by the structure 
of the factor group in the following way: if G is an infinite non-Abelian 
group of mappings of the ornamental strip onto itself, then the factor 
group with respect to the translations 


G/B, = {B,, B,, Bs, B,, B; , Be, , B, , Be} 


is Abelian. The element ®, is the unit element, and every element is 
inverse to itself (involution). The multiplication table is as follows: 


The group belongs to the type dealt with in the second paragraph of 
§9.2.8. Since it is of order 8, the theorem of Lagrange suggests that 
we look for subgroups of order 4 and 2. Such subgroups are to be 
found in the table, or more simply by a glance at the schematic figure 
(Figure 14). 
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The full ornament (occupying eight fields) has the structure of the 
group ©/%, (holohedrism). 

If we reduce the number of fields to four, we obtain seven different 
possibilities, corresponding to suitable choices of the occupied fields, for 
ornaments with the structure of the subgroups of order 4 (hemihedrism). 


Fig. 14 


If we restrict ourselves to two fields, we can find seven further arrange- 
ments, which have the structure of the subgroups %, of order 2 (tetarto- 
hedrism). 

Finally, if only one field is occupied, we have the unit element of the 
factor group. 

In Figures 15 through 29 we give examples of ornaments with the 
structure of the various subgroups. 

In addition to the eight symmetry operations described above, there 
exist three others that can appear in ornaments. The number of possible 
holohedrisms is thereby increased to four; they contain sixteen hemi- 
hedrisms and ten tetartohedrisms, the unit element in each case being the 
group of translations. 


ed 
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Groups of the same order are isomorphic to one another. Figure 30 
shows the lattice of subgroups of the holohedrism. 


Fic. 15 
Group UY, => {B, ’ B, ’ B, ’ B,} 


Fic. 16 
Group U, = {B, ; 8, * %, ’ B,} 


ree re 

Group U, = {(B,, B,, B,, Be} 
5 S 5 S Fic. 18 
WWW 


Group U, = {B,, B,, Bs, Br} 


Fic. 19 
Group U; = {B,, Bs, Be , Bs} 


Oy Fic. 20 


Group WU, = {B, Ba, Bs, Bat 


Group U, = {B, , Ba , Be ’ B} 
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Fic. 22 Group 8%, = {8,, 8,} 


Fic. 23 Group %, = {8,, 83} 


Fic. 24 Group %, = {8,, 84} 


Fic. 25 Group B, = {8,, Bs} 


Fic. 26 Group B; = {8,, Be} = cia scremcdeameaiestis 


Fic. 27 Group %, = {8, , 8} a IE 


Fic. 28 Group 8, = {8, , B,} 


elolee 


Fic. 29 Group 8, 
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10. The Homomorphism Theorem 


10.1. In §4.2 we considered one-to-one mappings A of a group © for 
which 


(1) (G,G,)' = GiG2; G,, G,€6. 


The following important generalization is obtained by dropping the 
requirement that the mapping be one-to-one. A mapping A of a group G 
into a group § that satisfies (1) is called a homo- 
morphism of © into §. The set of elements of G 
that are mapped onto the unit element 1, of is 
denoted by ©, and is called the kernel of 4: ©, = 
{G | Ge G, G* = 1,}; the set G = {G* | Ge G}is called F 
the image of © under the mapping A, or simply @Gf--"--~ 6’ 
the image under A (see Figure 31). 

The rules §4.2.1-5 are proved for homomorphisms 


in the same way as for isomorphisms. Grpr-- > Ig 
The following homomorphism theorem shows that 

homomorphisms offer a very useful means of finding 16 

normal subgroups in a group and of investigating Fig. 31 


their factor groups: 


The kernel of a homormorphism A of © into § is a normal subgroup of ©, 
and the image of © under X is a subgroup of $. Moreover, ©/©, ~ G*. 


Proof. From G} = ©} = 1g it follows that 1, = GiGz* = (G,G3"), 
so that G, is a subgroup of ©. Also, for every X¥ € © and Ge 6, 


(XGX41)4 = K°G*X = XX = 1,6, 
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so that G, is a normal subgroup of ©. For arbitrary X¥, Ye G we have 
X*(Y*)-! = (XY-!), so that G* is a subgroup of §. In order to show that 
G/G, ~ G* we consider the mapping induced by A on the set of cosets 
of G/G, . This mapping is in fact a mapping onto G’, since 


(GX) = Gxr= xX for Xe G. 


The mapping is one-to-one, since (G,X,)* = (G,X,)* implies x= X ‘ 
which means that (X,X2")' = 1, , X,Xz" € G, and GX, = G,X,. Finally, 


(6, X,6,%X2) — (G,X,X2) 
= (X,X,) 
= XIX 
= (GX,) (GX,)*. 


Thus we have shown that the induced mapping is an isomorphism of 
6&/G, onto G+. 

In order to complete the homomorphism theorem we note that every 
normal subgroup St of a group G is the kernel of a homomorphism of 6, 
since the mapping G > GM is a homomorphism onto G/N, as is easily 
seen from §6(2), and its kernel is obviously 3. 


10.2. -Let us consider two applications of the homomorphism theorem. 
For the first example let G = Kx (i.e., the set of n x n matrices over a 
field K; see §1.2.12) and let a mapping A from K,, into the multiplication 
group K* of the field K be defined as follows: to every matrix the mapping 
A assigns the value of its determinant: 


A= |Al, AEKX. 


Then, by the rule for multiplication of determinants, A is a homomorphism 
and its kernel is the group of matrices with determinant 1. Thus the latter 
group is a normal subgroup of Kx. But every element of K~* is an image 
under A, since k € K~ is the image of 


k 0 


0 1 
Thus the factor group of this normal subgroup is isomorphic to K*. 

For the next example we take G = B, (§1.2.6). If for o€ By, we set 
€, = +1 or e, = —1, according to whether the orientation of a triangle 
in E, is preserved or changed by the motion o (that is, whether o involves 
or does not involve a half-turn of the plane), then obviously €, €,, = €;,., - 
Thus the mapping o—, is a homomorphism of B, onto the cyclic 
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group of order 2. By the homomorphism theorem its kernel is a normal 
subgroup of B, whose cyclic factor group is of order 2. This group is 
called the group of proper motions in E, and is denoted by By . 

In B, we can also construct a normal subgroup. For this purpose we 
first note that in a proper motion the angle between a line and its image is 
the same for all lines. If to each element o € BY we assign this angle w, 
in radian measure, the product of two proper motions 0,0, corresponds 
to the angle w, + w,, , as follows from the theorem on the exterior angles 
of a triangle. Since two angles are equal if and only if their radian measure 
differs by a multiple of 27, the mapping o > w, is a homomorphism of 
By onto the factor group Rt+/<27), the so-called planar rotation group. 
The kernel of this homomorphism is the group 7, of translations, that is, of 
proper motions in which the image go of every line g is parallel to g. By the 
homomorphism theorem the factor group Bt/T, is isomorphic to R+/<27). 

Our examination of the structure of B, can now be brought to an end 
with the remark that 7, is isomorphic to the direct product of two groups 
of type R+ (§1.2.2), a fact whose proof we leave to the reader. The structure 
of B, may be examined in the same way. 


11. The Isomorphism Theorem 


11.1. In our discussion of direct products © = Mt x I we noted that 
G/N =~ M and G/M ~ MN. We now prove an important generalization 
of this fact. 


Isomorphism theorem: if N is a normal subgroup and U is a subgroup of 
G, then NOU is a normal subgroup of U, and 
NU/MN ~ U/MO U. 

The above statement about direct products corresponds to the special 

case that U is also a normal subgroup of 6 and G6 = NU, MOU = 1. 


Proof. Let YEU, VENA U; then YUX-e U since UW is a group, and 
XUX— € N since N is a normal subgroup, so that the subgroup MA YU is 
a normal subgroup of U. The elements of StU/M are the cosets of the form 
MU, Ue. For U,, U,EU the two statements MU, = MU, and 
(NA U)U, = (MA U) UV, are equivalent to each other. Thus (MU) = 
(9 A U)U defines a one-to-one mapping A of MU/MN onto U/M A U. Since 

(MU|NU,)" = (MUU) 
=(NNWU, (NN YW VU, 
= (NU )(NU,)’, 


the mapping 4 is an isomorphism of MU/I onto U/M A WU. 
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The various interrelations involved in the isomorphism theorem can be 
represented very clearly in a diagram if in the graph of the corresponding 
lattice we draw the line segments MU — M and U — MOU parallel to 
each other and of equal length (see Figure 32). 

Then the geometric fact that the segments G 
MU — Mand UW — MN Ware necessarily parallel 


and equal finds its group-theoretic expression wu 
in the fact that u 
Ns NOU = NU: U, N 
Na U 


which follows at once from NU: MAU = 
(NU: RN: MOU = MU: War: Mow : 


and the isomorphism theorem. Fig. 32 


11.2. In the next section we shall use the isomorphism theorem to 
prove an important result in the general theory of groups, but first let us 
illustrate the theorem for the above group By (§10.2) of proper motions 
in E, . It is easy to show, as in §3.3.4, that the set By of elements of B, 
leaving a point P fixed is a subgroup of Bj. Obviously, this subgroup 
consists of the planar rotations around the point P, so that like the factor 
group Bj /T, it is isomorphic to the group R+/<27). This fact also follows 
from the isomorphism theorem, since every proper motion can be obtained 
as the result of a translation followed by a rotation around P, which means 
in group-theoretic language that BJ = 7,Bzp . Since Bj p and 7, have only 
the mapping | in common, it follows from the isomorphism theorem that 


By/T, = By p & R*/<2n). 


12. Composition Series, Jordan-Hélder Theorem 


In dealing with the type problem we naturally try to divide up every 
group into its simplest possible components, as was done above for the 
case of Abelian groups. In the section on direct products we remarked 
that although the theorem on unique decomposition into a direct product 
is valid for an extensive class of groups, the indecomposable groups 
themselves are still too complicated to provide an acceptable survey of 
all types of groups. In the present section we undertake an analysis leading 
to a simpler class of basic components, though now there is the disad- 
vantage, not found in the direct product, that the given group is no longer 
uniquely determined by its components. 

12.1. We consider a group © 41 for which there exists a finite 
sequence of subgroups 


(1) K: G=NR DR, D...DR, = 1 
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such that 9, is a maximal normal subgroup of M,_, for 7 = 1,..., 1. 
Here / is called the length of the composition series K. The minimum of 
the lengths of all composition series of G is called the length of ©. There 
exist groups, for example J’+, that have no composition series, but such 
groups are always of infinite order. If © is simple, then / = 1, since 
G > | is the only composition series of ©. 

From the lattice isomorphism §6(3) it follows that if 


(2) Ke G/N = N/M D M/M ... DR, /N = WYN 
and 
(3) Ke: N= M~ IM, Id... Mt = 1 


are composition series of 6/3 and MN, respectively, then 
(4) Ks: G6 =N IN D..IN, IM, ID... My, = | 


is a composition series of G. Thus for any subgroup of St it is possible to 
construct a composition series of G that includes the subgroup N, provided 
G/N and Mt themselves have composition series. The part of K that lies 
in Mt, is a composition series for MN, . 

The factor groups 9,_,/I, occurring in (1) are called composition factors 
of K. Since %; is a maximal normal subgroup of M;_, , these composition 
factors are simple groups. If all the composition factors of a composition 
series are Abelian or if they are all simple groups of prime power order, 
the group G is said to be solvable, a term which arises from the applications 
of groups to the theory of fields (see IB7, §10). A solvable group © is 
necessarily finite. 

Now it is natural to ask whether a solvable group may not have other 
composition series with non-Abelian composition factors. The answer to 
this question is part of the following, more general, theorem. 


Jordan-Ho6lder theorem: if | is the length of © and 
K: G6=MN Id... = 1 
and 
L: G6 = MD... M, = 1 


are composition series of ©, then s = 1, and there exists a permutation o 
of the indices such that N,_,/N;  Mig-1/Mic - 


Proof. From each class of isomorphic simple groups we choose a fixed 
representative © and denote by n& and n§& the number of composition 
factors of K and L that are isomorphic to €. In (4) it is clear that 


K, — ywK, K, 
ng = ne + ng?. 


2 Groups 217 
Our theorem is proved if ng = ng for every €; that is, if nk = nk = n® 
depends only on © and not on the particular composition series. If © 
is simple, the theorem is obvious. Arguing by complete induction on the 
length / of ©, we now assume that / ~ | and that the theorem is already 
proved for all groups of smaller length than G. Then, if 9t, = Mt,, we 
have 
nk — n@ iy + nes = n@/Dy + ney = nt ‘ 


since G/N, = G/M, and MR, = M, are of smaller length than G. Otherwise 
we must have G = 9M, , since Mt, and Mt, are maximal normal subgroups 
of &. By the isomorphism theorem we then have G/N, ~ Mit,/N, A My, 
and G/M, ~ N/M, 0 RN, . But G/M, , G/M, , and D = M,N R, are of 
smaller length than ©, so that 


K — ,G/M N,/D D — ,yM,/D G/M D — pL 
n&é = ngs + ng? + ng = nel + ngs + ne = nt. 


Thus n§ = n& = n@ depend only on 6, as was to be proved. 


12.2. The great importance of simple groups in the general theory 
of groups becomes even clearer from the Jordan-Hélder theorem than 
from the remark in §6.4. Thus it is natural to ask whether we can find 
a satisfactory solution of the type problem for simple groups. But here 
the situation is as follows. In addition to the simple groups of prime order, 
research has uncovered many non-Abelian simple groups of finite and 
infinite order, but even the proof of such a basic statement as every finite 
non-Abelian simple group is of even order (conjectured by W. Burnside 
in the nineteenth century) seems to require almost all the immense 
apparatus of the present-day theory of finite groups.® So we shall content 
ourselves here with the proof, given in §15.4.3, that there exist infinitely 
many of these finite non-Abelian simple groups. 


13. Normalizer, Centralizer, Center 


In order to acquire insight into the structure of non-Abelian (i.e., 
noncommutative) groups, we must first introduce certain concepts, such 
as the commutator group, which measure the extent to which a group 
departs from being commutative. 


13.1. Fora given complex & of the group © let us enumerate the com- 
plexes conjugate to ®. For this purpose we note that the elements Ge © 
for which G-1RG = S form a subgroup, since from G;'RG, = Gz'KG, = K 
it follows that G,XGz* = K and (G,Gz")K(G,Gz!) = G,G{'RG,G;' = K. 


* For the proof of this theorem, see: Feit, W., and Thompson, J. G., Solvability of 
groups of odd order, Pacific J. of Math. 13 (1963), pp. 775-1029. 
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This subgroup is denoted by Ng and is called the normalizer of & (in G). 
Then the statement 


G3RG = H18H 
or 
HG“&8GH-! = & 
is equivalent to 
HG-"'€ Ng 
or 
HENGgG. 


Thus the number of complexes conjugate to & is equal to G: Ng. 

If U is a subgroup of 6, it follows from the definition of Ny that Ny 
is the largest subgroup of © in which U is a normal subgroup. If the 
complex in question consists of the single element G, then Ng is called 
the centralizer of G, and more generally 

Za = () Ne 
Gek 
is the centralizer of the complex &. This centralizer consists of all the 
elements of © that commute with every element of &. If G is Abélian, 
then Z, = © for every complex R C G. The complex Z, is a subgroup 
of G, since it is the intersection of certain subgroups. The centralizer Z¢ 
of the whole group G is called the center of ©. The center Zg is Abelian, 
and every subgroup of it is a normal subgroup of ©. 


13.2. If the factor group ©/3 of a subgroup 3 in the center of © is 
cyclic, then & is Abelian: for if G/3 = <3G), then every element © € © 
can be written in the form 

G = ZG’, 
with Z € 3. But for two elements G; = Z,G"!, Gz = Z,G"%, Z, , Z,€ 3 we 
then have 
G,G, aw Z,G"Z,G" a Z.2,G"G" = Z,G"Z,G" = GG, . 


13.3. The relation of conjugacy is easily seen to produce a partition into 
classes for the elements of G. The single-element classes are exactly the 
elements of the center of G. If K,,...,.K, are representatives of the 
remaining classes, the result of §13.1 shows that the following equation 
must hold for the number of elements in 


(1) G:1=—=Zge:14+ GiZzx,+..+ 6:2, 


with Z K, 7 6; i = 1,..., r. This equation is called the class equation of G. 
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For SG) there exist three classes of conjugate elements: 


{1}, {a, B}, {y, 5, e}. 
Thus the class equation is 


oe cg (ee eee 


14. p-Groups 


14.1. A finite group G of prime power order p* is called a p-group. 
From the results of the preceding section we can show that p-groups are 
solvable. For this purpose we require the following lemma: 


If G is a p-group, then Zg 4 1. 


Proof. The indices G : Z; on the right side of the class equation for © 
are 1, and since they are factors of the order of the group, they must be 
powers of p. The left side is p*; therefore p must be a factor of Z, : 1. 

From this lemma we see at once that every p-group is solvable. 


Proof. The theorem is true if © is Abelian. For © : 1 4 pit then follows 
by complete induction on the order, first for G/Z, and Z, and then for 6, 
if we construct a composition series for © that passes through Zg . 

We leave to the reader the proof that every maximal subgroup of a 
p-group is a normal subgroup, as well as the verification of these theorems 
for the examples B,.9 and Q (§1.2.7, §6.2). 


14.2. Since the foregoing theorems show that the order of the center 
of a group of order p? is either p or p*, the factor group with respect to 
the center of such a group is either of order p or of order 1, and thus is 
cyclic in every case. As was shown in §13.2, it follows that a group of 
order p* must be Abelian. The existence of non-Abelian groups of order p?® 
is shown by the examples By9 and Q (§§1.2.7, 6.1). 


14.3. The great importance of p-groups for the general theory of 
finite groups rests on the fact that the order of a subgroup is a factor 
of the order of the group (§3.5). But the example SG“ shows that the 
converse is not necessarily true; i.e., there exists a factor d of GS : 1 for 
which there is no subgroup of order d. However, we do have the theorem of 
Sylow: if p* is the highest power of the prime p which is a factor of © : 1, 
then there exists in © a subgroup of order p*. Thus every group contains 
at least one subgroup whose order is the highest possible power of p 
permitted by the theorem of Lagrange (§3.3(8)). The proof proceeds by 
induction on the order of ©, as follows. For © = 1 the theorem is obvious. 
Now let us first suppose that there exists a proper subgroup U of G, 
whose index © : UW is not divisible by p. Then p*« is a factor of U: 1. If 
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we now assume that the theorem has already been proved for all groups 
of order smaller than © : 1, then W contains a subgroup of order p*, 
which as a subgroup of © provides the desired result. If p is a factor 
of the indices of all the subgroups, then p must be a factor of 
Zg:1 in the class equation §13(1) for 6, since G:1 and all the 
G:Zx, are divisible by p. By the fundamental theorem on finite 
Abelian groups, there exists in Zg a subgroup $8, ~ | whose order is a 
power of p, say p**. This subgroup is a normal subgroup of © and the 
group G/ is of smaller order than G. The greatest power of p that divides 
© : P, is p* **. Thus by the induction hypothesis there exists in G/Py a 
subgroup %/#, of order p*~**. Then the subgroup % of G has the order p*, 
as was to be proved. 

The p-subgroups whose existence has just been proved are called 
Sylow p-groups of G, in honor of their discoverer. The theorem does not 
state that there exists only one Sylow p-group for every prime p, but 
Sylow did prove that every p-subgroup of G has a conjugate in an 
arbitrarily preassigned Sylow p-group of ©, so that, in particular, all 
Sylow p-groups of & for a fixed prime p are conjugate to one another. 

The p-groups are included in a larger class of groups, the so-called 
nilpotent groups, which are defined as the direct products of groups of 
prime power order. They are of particular interest because for them we 
can prove the converses of the theorems given above for p-groups. For 
example: a finite group is nilpotent if and only if every factor group has the 
property that its center is not merely the unit element. Or: a finite group is 
nilpotent if and only if every maximal subgroup is a normal subgroup. 
For lack of space the proofs must be omitted. 


15. Permutation Groups 


15.1. Representations 


In the example of groups of motions we have seen that a group may be 
much easier to investigate if it is not defined abstractly, say by its multi- 
plication table, but in some geometric way. One simple but effective 
method (see the examples below) for getting a clearer picture of the 
concept of a group is to investigate the possibilities of representing the 
group as a permutation group, that is as a subgroup of GS” for a suitable 
space . Let us examine these possibilities. 


15.1.1. For an arbitrary group © let o be a homomorphism of © into 
S*. Then a is called a permutation representation of © in R or simply a 
representation. The number | ® | of elements in R is called the degree of 
the representation. 

If o, , og are representations of © in R, and R, , respectively, and if th- 
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two permutations differ only by a renaming of the permuted points, 
ie., if there exists a one-to-one mapping 7 of R, onto R, such that 


(1) (PG”)" = P’G”? for all GeG and PeR,, 


then we say that o, and o, are similar. The relation of similarity obviously 
produces a partition into classes and, as we have done up to now for 
isomorphic groups, we shall consider similar representations as essentially 
not distinct. For example, the two representations o,, o, of the cyclic 
group & = {1, G} of order 2 in R, = {a, b,c} and R, = {A, B, C} are 


similar: 
r=689 e=C89) 
m=(6i S=Gad 


Here the relation (1) is established by the mapping 7 with 
a= A, bt = B, c=. 


15.1.2. When a fixed representation o of G in ® is being considered, 
we shall for brevity set 


PG = PG, PeR. 


For every PE ® the elements G € © leaving P fixed form a subgroup 
Gp = {G| PG = P}, as we have seen in §3.3.4. We shall call it the fix- 
group of P. 

Subspaces of ® of the form PG are called domains of transitivity of o. 
Every element Pe is contained in exactly one domain of transitivity, 
since 


QeP,G A P,G, 
1.€., 
Q = P,G, = P,G,, G, , G,€ G, 
implies 
P,G,6 = P,G,6 
and 


P,G = P,G. 


Thus the domains of transitivity R® of R produce a partition into classes 
of R: 


R= ROMY LYGR”, 
RO OR”) — GO for ifk. 
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For every i = l,...,r the representation o induces a representation 0; 
of © inR™, which is related to o in the following way: 


P,G° = P,;G% for P,ER™. 


Thus it is sufficient to consider representations o of © in R for which 
R itself is a domain of transitivity. Such representations o are said to be 
transitive. 

Transitive representations of G can be obtained in the following way: 
choose a subgroup U of G and then in the set R of cosets UH of U define 
the permutation G°U by 


(UH)GU = UHG. 


Since 
(WH)(G,G,)u = UHG,G, 

= (UHG),) G, 

= ((UH) Gout) Gru 

= (UA) GeuGeu, 
it follows that oy is a representation of ©. Every coset has the form 
UG, GeG, so that the representation is transitive. We call it the 
representation of © induced by WU. It is clear that U is the fix-group of U 
(as a point of R) and G : U is the degree of oy . 

Then the following theorem shows that, up to similarity, we have thus 

obtained all the transitive representations of G: if o is any transitive 


permutation representation of G in R and if P € KR, then a is similar to 
the representation o, of G induced by U = Gp. 


Proof. Since 
PH, = PH; 


is equivalent to 
PH\H3-' = P 


and to 
HH, ¢ GpH, or 6 pH, —= GpH, , 


the mapping 7 defined by 
(PH’) = GpH 


is a One-to-one mapping of ® into the set of right cosets of Gp , and the 
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transitivity means that every coset of Gp is an image under r+. Finally, 
for Q = PH?’ we have 


O'G'X = GpHG = (P(HG)Yy = (QG"y’, 


so that (1) is satisfied for o, = oy, og = 9. 
For transitive representations of © in ® the preceding theorem implies 
in particular that 


(2) 6:1 = (Gp: 1)-| RI. 


Consequently, the degree of a transitive representation is a factor of the 
order of the group. 

We remark without proof that the representations induced by two 
subgroups UW, B are similar if and only if the subgroups are conjugate 
under ©. 


15.1.3. Of particular importance is the representation of © induced 
by the subgroup |, the so-called regular representation of ©. It plays an 
especial role in the history of the subject, since it was used by Cayley to 
show that every abstractly defined finite group can also be defined 
“concretely,” namely, as a permutation group. 


15.1.4. We now wish to determine the kernel 6, for a transitive 
representation o of &. For this purpose we first note that if PG = Q, 
GeG, PER, and QER, then Gg = G-G,_G, as is clear from the fact 
that OQ = QH is equivalent to 


PG = PGH 
and 
P = PGHG~", 
so that 
GHG"'eé ©>, 
H = G"G6,G. 


Thus it follows from the transitivity of o that for an arbitrary but fixed 
point P é ® the subgroup 
G, = () G'GpG 


Ge® 


leaves every point of R fixed and is the kernel of o. It is the greatest normal 
subgroup of © in Gp, as is easily shown. 


15.2. The Symmetric Group of Permutations 


In the preceding section we have discussed group representations as 
subgroups of the symmetric group. This discussion can be made more 
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useful for the general theory of groups through a closer study of the 
structure of G™, to which we shall devote the next two sections. By 
regarding S™ and its subgroups as being represented by the identical 
representation of themselves we can make use of the results of the 
preceding section, for which purpose it is convenient to apply the concept 
of transitivity not to the representation but to the group itself. We assume 
throughout that § is finite. 


15.2.1. The group S* is obviously transitive, and for Pe R we have 
S*P ~ S%, since S% contains all permutations of the points in 
R — P. Thus by (2) 

S®:1 = (S*? 31) -|R|. 
Since S®:1 = | for |R| = 1, it follows by complete induction on the 
power | ® | that 
S*:1=|R]! 


Now let « be an element of S®. The domains of transitivity of <a> 
have the form 


LP Pe ig PO 


where n is the smallest positive integer with the property Px” = P. Then 
a can be written as 


ae é Pa ... Pa" 41Q *), 


G3) PaPo®...P Qa... 


where P, Q,... are representatives of the various domains of transitivity 
of <a>. If the points of R, except for P<«>, are left unchanged by «, we 
write simply 


« = (P, Pa, ..., Pa") 
and call « a cycle of length n. For example, 
123456 
*=(1 63245) — 2654) 
is a cycle in S®), 
Cycles are of special importance, since every element «¢ S™ is the 


product of pointwise disjoint cycles. As can be calculated at once, the 
element (3) may be written in the form 


(4) a = (P, Pa, ..., Po" )(Q, Oa, ...)(...) .005 
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which is called the canonical decomposition of « into cycles. Thus for 


12.3 
“=(415672 


this decomposition is 
(5) a =(1 3)2 456 7). 


In order to include the identical permutation 1 in this form of writing 
we set 

1 = (P) 
with an arbitrary Pe S™. 


15.2.2. The advantage of the cycle notation lies not only in its greater 
conciseness but also in its convenience for determining the order and 
conjugacy of permutations. 

Thus we see at once that for a single cycle the order is equal to the 
degree: 


(P,P, ... P,)” = 1, (Risch) 1 for l<m<n. 
Since pointwise disjoint cycles are permutable, we have 
ao = (Py... Pin, (Por +++ Pan) ++ (Par -++ Pan,): 
6 
( ) a” = (Pu one Pin, )"(Par wee Pon)” ove (P41 <8 Pon”, 


so that «” is equal to the identical permutation if and only if 
m|n,...,m|n,. 


Thus the order of « is equal to the least common multiple of the length 
of the cycles in the canonical decomposition of «. 

For example, the element « in (5) has the order 10. 

We now wish to determine which elements are conjugate in S™ to the 
permutation whose canonical decomposition is given by (6). Let Be S™. 
We see that 


B-a8 oars (Pii8, P28, wee Pin,B) eos (PB one PnP) 
is the canonical decomposition of B-!a8, since 


(Puf) PB = Pu, Pyo = Pie, 
so that 


(Pii8) BtoB = Pr, 


and so forth. 
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For example, 


p-(U23, =a 
B08 = (2 3). 


We shall say that « and «* = B-la8 are similar, by which we mean that 
the cycles in the canonical representation of « and a* can be put into 
one-to-one correspondence with each other in such a way that corre- 
sponding cycles have the same length. Then our problem is already 
solved: Two elements of S* are conjugate in S™ if and only if they are 
similar. 

For the proof we need only show that two similar permutations 


a= (Pi eee Py) eee (Pa eee Pin), 


ak = (Ph + Pi) (PR - - Pe) 


1 
are conjugate to each other. But with 


B= (pe px’) 


it is easy to calculate that B-1a8 = a*, as required. 


15.2.3. From this result a permutation £¢ S™ in which a = a€ for 
all «¢G™ cannot be similar to any other permutation in S™. For if 

= |, the statement is certainly true for an arbitrary ®, and for | R | = 2 
it is true for the permutations distinct from 1. For |®|>3 we see 
that 


AS \(P iy Pixie) dec 


and 

ow! = (P Pig 1) os Pig Pry 
are similar and that «+ a’, since P,,« 4~ P,,«’. Thus for | ® | > 3 the 
center of S™ consists only of the identical permutation 1. For |R®| = 1, 2 


the group S” is obviously Abelian. 


15.3. The Alternating Group 
15.3.1. A cycle of degree 2 is called a transposition. Every finite cycle 
is a product of transpositions: 


(P,... Pa) = (Pn—1Pn)(Pn—ePn—1) «. (PrP). 
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Since every permutation can be written as a product of cycles, it follows 
that all permutations are also products of transpositions. However, the 
transpositions are not necessarily pointwise disjoint. For example, 


1234 
b3isa 


5 4) = (1 2 3)(4 5) = 2 3)(1 2)(4 5). 


Thus the group S” is generated by the transpositions in R. 


15.3.2. The elements of S® that can be written as the product of an 
even number of transpositions are called even transpositions. They form 
a normal subgroup of S™, since for 


a = (P1Q)) ... (P2xQex), 

B = (R,S}) ... (ReSou) 
the element «8-! = (P,Q) ... (PexQox)(RopSo1) ... (RS) is also the prod- 
uct of an even number of transpositions, and conjugate permutations 
are similar to each other. This normal subgroup is called the alternating 
group in ® and is denoted by 2". We now wish to prove: for |R®| 4 1, 
the alternating group Xt" has index 2 in S™, so that A™ has the order 
| KR !/2. 

For the proof we take ® = (a). Then for an arbitrary polynomial 
f(X1, -; X,) in the indeterminates x, , ..., x, «€ S™ we define 
f° gay Xn) = Ir a 089 Kina) 
so that 
(fry = fo 


As a particular polynomial let us consider the difference product 


AG iG ey) = | I] Gi — xz). 


Apart from a change of sign, the application of « to 4 merely permutes 
the factors of the right-hand side: 
A~ = «A, e, = +1, 
so that 
€pd = A = €, Ao = €,6,4. 
The mapping ¢:«— e, is therefore a homomorphism of S* into the 
cyclic group of order 2. The desired theorem now follows from the 


homomorphism theorem if we show that e is a homomorphism onto 
this group and that its kernel is 2”. But it is easy to show that 


Aaa) — —A, 
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so that €(42) = —l. Also, since every transposition (ab) in S™ is 
conjugate to (12), say 
B-X1 2) B = (a 5), 

we have 

E(ad) = €p-1€(12)€8 = (€g)*€(12)€p = €qaa) - 
The product of an even number of transpositions is thus mapped by 
€ onto |, and the product of an odd number onto —1, so that the proof 
of the theorem is complete. 


15.3.3. For |R|= 1 we have UW = S®, and for n = 2, 3 the 
order of W™ is 1, 3, respectively. For n = 4 the permutations 


(1 2)3 4), (1 3)2 4), = 4)(2 3), 


together with the unit element 1 form a normal subgroup of YW) of order 4, 
the so-called four-group. The lattice of subgroups of YW is given in 
Figure 33. In all other cases 2 is simple (and non-Abelian), as we shall 
show in the next section for n = 5. 


Order 
72 


Order 
2 


Fig. 33 


15.4. Applications of the Theory of Permutation Groups 
to the General Theory of Groups 

Let us now give some examples to show how our results on permutation 
groups can be applied to the general theory of groups. They are all 
concerned with the important question of finding conditions under 
which a group is simple. 

15.4.1. For a representation o of G in ® the index ©: G, is always 
a factor of | ® |!, since the homomorphism theorem shows that 6/6, is 
isomorphic to a subgroup of 6*. Thus for the representation induced 
by a subgroup U of G, the index of the greatest normal subgroup of © 
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contained in U is a factor of (G : U)!. If G is simple, there is no subgroup 
U in G for which (© : U)! < G: 1. Since the group W is shown below to 
be simple, it contains no subgroup with index 2, 3, or 4. 


15.4.2. If G is a group of order 2u with odd u, then © contains a normal 
subgroup Nt with index 2. For let o be the regular representation of G 
in R (=G). Then Ge G, G $ 1 leaves no point of R fixed, since 1 is the 
fix-group for all points of R. Now let G be an element of order 2, which 
must exist by the theorem of Sylow for p = 2. The canonical decom- 
position of G? into cycles consists of u transpositions. Thus G? is not an 
element of 2%, so that the intersection GM A™ is 4 G-. Also, since 
4" is a maximal subgroup of S™, we have U*Ge — S™. By the iso- 
morphism theorem it follows that Ne = W™ A Ge is a normal subgroup 
of G? and 


Ge/Ne ~ SF, 


Consequently, Jt is a normal subgroup of G. Since the Burnside conjecture 
is now known to be true (§12.2), we have the result: 
The order of every non-Abelian simple group is divisible by 4. 


15.4.3. From the results of the preceding sections we now prove the 
existence, as stated above, of infinitely many non-Abelian simple groups, 
by showing that if G is a permutation group in R with |R | = r a prime, 
and if G is generated by cycles of length r, then © is simple. 


Proof. The group © is transitive, since P<Z> = R for each of the 
generating cycles Z and PER. Let MN be a normal subgroup +1 of G. 
By (2) the fix-group Gp has prime index r in © and is thus a maximal 
subgroup of ©. By §15.1.5 the normal subgroup % is not in Gp, since | 
is the only permutation in @ that leaves fixed all points of R. Thus 
MNGp = G and by the isomorphism theorem 


G/N = G/N (a) Gp . 


Now only the first power of the prime r is a factor of © : 1, since G is a 
subgroup of S™ and S™ has the order r!. Thus Gp: 1 and G©:R are 
prime to r. If one of the generating cycles Z were not in M, then NZ» 
would be greater than Jt, and on the other hand it would follow from the 
theorem of isomorphism that 


NZ) N= LCDs NON CD, 


so that MZ : MN would be a factor of <Z> : 1 = rand would thus be =r, 
in contradiction to the fact that G : Nis not divisible by r. Consequently, 
all the generating cycles lie in Jt, so that G = MN, and G is therefore 
simple. 
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If in this theorem we choose the generating cycles in such a way that 
they are not all powers of a fixed element, as is always possible for r > 5, 
then the group generated by them is simple (and non-Abelian). Thus 
there exist infinitely many (nonisomorphic) non-Abelian simple groups. 


15.4.4. For example, the (non-Abelian) group generated by the two 
cycles 
(1234567), (2134567) 
is simple. 
For r = 5 there are 4! = 24 distinct cycles of degree 5. By §15.3.1 they 
are all products of r — | transpositions and thus lie in QU). Since 


(abcdelecbade)=(ce ad), 


the 10 cycles of degree 3 also lie in the subgroup of ‘> generated by the 
24 cycles of degree 5. But since &) : 1 = 60 has no factor >24 + 10, 
the theorem of Lagrange shows that this subgroup must be equal to YW), 
Thus the above theorem shows that 5) is simple. Let us state here 
without proof that there are no non-Abelian simple groups of order less 
than 60. 


16. Some Remarks on More General Infinite Groups 


The nature of the present work has restricted us to a discussion of the 
simplest and most general results concerning groups, although we have 
given some special theorems about groups of finite order. This preference 
for finite groups is justified by the fact that, both historically and in their 
importance for the whole of mathematics, finite groups have formed the 
backbone of group theory. Moreover, the general theory of infinite 
groups is not yet very well developed. Of greatest importance here are the 
finitely generated groups and the topological groups. To conclude the 
present chapter, we give one striking example for each of these two classes 
of groups. 


16.1. Free Groups with Finitely Many Generators 


Let {E, , ..., E,} be a finite set of distinct elements, E; 4 F, fori k. 
We consider words W of the form 


() W= Eo... Ex, 
Sissies = 1, 


€, = +1, 


€y41 = a —'€é; for L — lo4a . 
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A word of this sort is to be combined with another such word 

Ve Be 
by the following rule: 


WV = Eo... Etrn-1E%u2 ., EPs, 
i 


r-n-1 nt2 8 


if i, et ky and €, aes —>d, 
and boy = ks ” €y_4 = —, 

. wu" 
and ly_n — eae Er_n — —Snits 


but lyn—-t - Knee or €r—n-1 Fe = Ons ; 


Then WY is again a word of the form (1) if we consider the empty set @ 
as a word (the so-called empty word). If we now define WO — 0W = W 
for all words W, the set of words with the operation defined in this way 
forms a group %, , called the free group of rank n. Its unit element is 0 — 1, 
and W1 = E;*,,. E;“ is the inverse of (1). Clearly the structure of this 
group depends ‘only on the number v and not on the special set {E, , ..., E,}; 
moreover, §, = <£,,..., E,>. The group %, may be considered as the 
most general group that can be generated by n elements. More precisely, 
every group & that can be generated by n elements G, , ..., G,, is isomorphic 
to a factor group of %,, . For example, an isomorphism of this sort results, 
by the homomorphism theorem, from the mapping 


W= Ea... Ee + Ga ... Gt, 
uy 1, ty ty 


which is easily seen to be an homomorphism. On the other hand, it is 
clear from §6.4 that every factor group of %, can be generated by n 
elements. 

If a system R,,..., R, of elements of %,,, taken together with their 
conjugates under &, , generates the kernel of a homomorphism A of On 
onto &, then the set of elements 


Ry, ..., R; 


is said to define ©. This term comes from the fact that if a product of 
elements in the system of generators {E°, ..., E}} of G is equal to 1, 
then it can be obtained by constructing the inverses, transforms, and 
products of the elements R;, i = l,..., t, and then modifying them by 
replacing E; by E},i = 1,...,n 
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For example, 

Ei, £3, FE, EE, E, 
or 

Ei, Ex, (EE) 


define the group SG"), 

For a given R,,..., R; € ®n, the question whether an element We &,, 
lies in the normal subgroup generated by the R; and their conjugates is 
usually very difficult. It is the so-called word problem; cf. IA, §5.3. 


16.2. Topological Groups 

If a topology is defined on a group © regarded as a set of elements 
the group © is said to be a topological group, provided the topology 
is consistent with the operation of the group; more precisely, if the 
mapping (G, H-!) > GH-, G, He ©, of the product space G x G is 
continuous on ©. 

For example, the group R+ is a topological group if for every e > 0 
the set of numbers U,,, = {a* | a*e Rt, | a — a* | < e} is taken as a 
neighborhood of a. The continuity of addition is seen as follows: if 
U,_»,- is a neighborhood of a — b, then a* — b* € U,_»,, for a* € Ug ein, 
b* € Uy 2, aS is easily shown. Thus the image of the neighborhood 
V = Usgerg X Us.eig Of (a, b) under the mapping (a, b) + a — b lies in 
Uagsv,e, 80 that the mapping is continuous. 

More picturesquely, though perhaps somewhat less precisely, we may 
say that a topological group is defined if the product G* H*-! “approaches” 
GH- as G* and H* “approach” G and H. Then we may exploit the 
resources of topology, in much the same way as permutations were used 
above for finite groups, to investigate the structure of a group in which a 
topology is defined. 


CHAPTER 3 


Linear Algebra* 


Outline 


The statements in this outline are not intended to be complete or rigorous 
and they sometimes refer to concepts that are not explained until later in the 
chapter. 


The solutions of a system of linear equations consist of n-tuples of 
numbers: x = (x?,..., x");! it may be that there is no solution, exactly 
one solution, or infinitely many. We are interested here in the existence 
of solutions, and particularly in how many of them there are. It turns 
out that the set of solutions (of a homogeneous system of equations, to 
which a nonhomogeneous system can be reduced) has a definite algebraic 
structure: in it we can perform the operations of addition and of 
multiplication by a number; in short, we can construct linear combinations. 
Such a structure is called a vector space and its elements are vectors. The 
name is to be explained by the historical association with geometry and 
physics, but here we are discussing a purely algebraic situation. The theory 
of solutions of systems of linear equations is part of the more inclusive theory 
of a certain algebraic structure, a “vector space.” 

Starting from the n-tuples of numbers, we arrive at this structure by 
defining the two operations mentioned above and then working out the 
rules of computation that hold for these operations, whereupon the rules 
of computation become axioms for the ‘‘vector space” thus defined. To 


* The authors of this chapter are deeply indebted to G. Pickert for his valuable advice 
and assistance. 

1 As is customary in tensor analysis, we make use of subscripts and superscripts in 
such a way as to indicate the behavior of the magnitude in question under a transforma- 
tion of coordinates. Except for —1 and for the exponents in (3) on page 286 there are 
no exponents in the present chapter. 
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begin with, the n-tuples of numbers are merely examples or models of 
vector spaces, but it turns out that the vectors of any vector space can be 
represented by xn-tuples of numbers (coordinates), though only after a 
(largely arbitrary) choice of vectors as basis vectors. 

Thus, in the theory of vector spaces there are two points of view to be 
distinguished: either we base the development solely on the rules of 
computation (the axioms of the vector space) and produce a coordinate- 
free theory, or else we introduce coordinates, in which case we must 
subsequently make our theory independent of the special choice of basis; 
that is, we must investigate the invariants of a transformation of basis or 
at least ask what happens under such a transformation. In any case, any 
concrete representation of vectors will usually be in the form of 
coordinates. 

The two points of view will be presented here side by side. In §1 the 
foundations of the theory are developed. In §2 we investigate /Jinear 
mappings of a vector space V,, of dimension x into a vector space V,, of 
dimension k. Generally speaking, these mappings will themselves form 
a vector space of dimension n - k. In the coordinates of the vector spaces 
V, and V, the mappings or transformations are represented by matrices. 
For the mappings of V,, into itself (V, = V,) it is possible, beside the 
operations of the vector space, to introduce a multiplication, namely 
successive application of mappings. The resulting structure is a ring, and 
we obtain the foundations of the theory of matrices. 

A change of basis in V, and V, results in a certain transformation of 
the matrix of the mapping. In this way a given matrix can be transformed 
into a diagonal matrix, and one of the applications of this particular 
transformation is the solution of a system of linear equations, the problem 
from which we started out. 

Linear mappings of the vectors of V,, onto numbers (i.e., those linear 
mappings for which V,, is a vector space of dimension 1; in particular, the 
domain of scalars) are /inear forms. They constitute the vector space dual 
to V,, which is important in applications to geometry and elsewhere 
(cf. 118, §4). 

In §3 we set ourselves the problem of introducing products of vectors. 
It turns out that we cannot satisfy all the rules for computation with 
numbers. After making a suitable choice of these rules we can state the 
requirements for such a product as follows: the taking of a product is to 
be a bilinear mapping of the pairs of vectors of V onto the vectors of a new 
vector space W. The various kinds of products result from various choices 
of W. If W is the domain of scalars, we obtain the inner or scalar product; 
then it is natural to investigate those changes of basis that leave the factors 
of the inner product invariant (of course, the inner product itself is 
invariant by definition); and thus we are led to the orthogonal transforma- 
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tions. In the applications an important role is played by the reduction 
of symmetric matrices to diagonal form by means of orthogonal trans- 
formations. We shall deal with this problem in the complex plane; i.e., 
we investigate unitary transformations of Hermitian forms, since the 
proofs are the same as in the real field; but we must first develop the 
theory of determinants. 

If for W we choose a vector space of dimension n-n, we obtain the 
tensor product, and as its alternating part, so to speak, the outer or 
alternating product. The outer product of several factors leads to the 
determinant. The vanishing of the outer product characterizes the linear 
dependence of vectors. Since a system of (homogeneous) linear equations 
can be regarded as a query concerning the linear dependence of certain 
vectors (the column vectors of the coefficient matrix), we again obtain 
an insight into the theory of such systems of equations, 


1. The Concept of a Vector Space 


1.1. Introduction 


We start with the problem of finding the solutions of a system of linear 
equations 


(I) Y atx’ = bY (« = 1,...,K). 


v=1 


In the applications the a,*, b* are generally real or complex numbers. 
For the present we need only assume that they are elements of a field, 
which we call the domain of scalars S. Up to §3.2 we may even dispense 
with the commutative law. A system of elements that satisfies all the 
axioms for a field (cf. IBI, §3.2) except commutativity of multiplication, 
is called a skew field. Thus fields are themselves skew fields; an example 
of a noncommutative skew field, i.e., of a skew field that is not a field, 
is given by the quaternions (see IB8, §3). 

Thus we assume that the domain of scalars S is a skew field, and we call 
its elements scalars, The set of equations 


(II) Y apr 0, Ges Tena) 


v=1 


is called the homogeneous system corresponding to (I). We shall nowhere 
need to assume that (I) is strictly nonhomogeneous, i.e., that at least one 


bt £0. 
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The solutions of such a system of equations are n-tuples of elements of 
S, which we write in the form: 


z= (x}, see x") a (x?)=1 Moeae n 


or more briefly, if n is known, as (x’). 

Now our chief questions are whether such a system of equations has 
solutions and, if so, how many. Methods for numerical calculation of the 
solutions will turn up incidentally, but they are not the object of our 
investigation. 

We see at once that if (I) has two solutions 


(01 ca Xk”), yn = (y}..., y”), 


then 3 = (x! — yl, ..., x" — y”) is also a solution of (II). 
If (II) has several solutions x, = (x,},...,x,"), A = 1,...,4, then all 
linear combinations 


U 
3= >} z,c= (y 5 eee x,re') 
r a 


A=1 


with arbitrary c+ in S are solutions of (II). Thus it is natural to introduce 
an addition for n-tuples and a multiplication by scalars, and then to 
investigate the algebraic structure of the resulting configuration (for the 
definition of this word see IB10). 


1.2. Calculation with n-tuples 


Two n-tuples are equal if the corresponding scalars are equal; in other 
words, 


(G) (x”) = (Q”) if and only if x” = y’ for every». 


Of course, this definition of equality is part of the definition of an n-tuple 
as a mapping of the numbers |, ..., 7 into the set of scalars. 
It is easy to show that equality as thus defined is reflexive, symmetric, 
and transitive and is therefore an equivalence (see IA, §8.3 and 5). 
Addition can be introduced as follows: to the n-tuples x = (x’), y = (y’) 
we assign the n-tuple (x” + y’) as their sum. The sum defined in this way 
clearly has the following properties: 


(Al) Existence and uniqueness: two n-tuples x,y, have exactly one 
n-tuple as their sum, which we denote by x + y; 

(A2) Consistency with equality: from x = u and pn = v it follows that 
xt+ty=u-+o. 
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(A2) is a special case of the general principle of equality in logic: if A(x, y, 3) 
is a statement about x, n, 3, then A(z, 9, 3) and x = u,yn = v, 3 = w imply 
A(u, v, w). In our case A(x, pn, 3) is the statement x + 1) = 3. Only this special 
case, and the corresponding case (M2) for multiplication, will be needed below. 


On the basis of (Al) and (A2) our addition is an operation defined on 
the set of n-tuples, and the corresponding statement holds below for (M1), 
(M2). We write 


(A) r+ Y= (x) + (y") = (x 4+ yp). 
By (G), (A), and the rules for a skew field we have the 


(A3) associative law: x + (y + 3) = (x + u) + 3; 
(A4) commutative law: x+y = yn +2; 
(A5) neutral element: there exists an n-tuple o such that x + o = x for 
every x, namely o = (0,...,0). It follows that there can be only one 
neutral element, for if 0 and ov’ were two such elements, we would have 
0’ + 0 = 0’ and 0 + 0’ = », so that by (A4) and the transitivity of 
equality vo’ = op, 
(A6) inverse elements: for every x = (x’) there exists an x’ with 
¥ + x’ = 0, namely x’ = (— x’). We write x’ = —xand y+ (—x)=y— x. 
From (AI) to (A6) follows (A5’): for every pair of n-tuples u, v there 
exists exactly one n-tuple with u + x = p, namely x = pv — u. 
On the other hand, (A5) and (A6) follow from (Al) to (A4) and(AS’), 


if we postulate the existence of at least one element. (Cf. also IBI, §2.3 
and IB2, §2.4.) 


Multiplication by scalars is introduced as follows. Since we shall 
naturally wish to be able, for example, to write x + x = x: 2, ice., 


(x*) 2 = (3°) + () = (x + 2°) = (+2), 


we define right-multiplication of an n-tuple (x”) by an element s of S 
(S-multiplication on the right; the scalar s is written to the right of the 
vector x) by setting 


(S—M) ¥-S = (x1, ..., x") > 5 = (x! <5, ..., x" +8), 


or, more concisely, 


(x*) > 5 = (x" +s). 


This scalar product, as well as the product in S and later the product of 
two matrices, will be denoted by writing one factor after the other with or 
without an intervening dot, depending on whether or not the dot seems conducive 
to clarity. Since these three kinds of products will be distinguished from one 
another by the notation for their factors, they can all be written in the same way. 
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It must be pointed out that s-x is not yet defined, even when S is 
commutative. For the time being we shall not require this left-multiplica- 
tion; when it is needed later, we shall define it by setting s - (x”) = (s+ x’). 
If S is commutative, then s:x = x°5. 

This definition (of scalar multiplication on the right) implies 


(M1) the existence and uniqueness of the S product: for every x-tuple x 
and every sin S there exists exactly one n-tuple y = x°s. 


In the same way as for addition, it is easy to show that 
(M2) the S-multiplication is consistent with the equality: x = y and 
§ =timpliesx-s = y°-t; 
(M3) the associative law: x -(s:t) = (x-s)-t; the distributive laws 
(M4) («+ y)'s=x-s4+y°5s, 
(MS) z:(st+t)=x-sixct; 
and further, 
(M6) if 1 is the unit element of the skew field S, then x: 1 = x. 


A set of elements (here n-tuples) in which these rules for computation 
are defined, is called a vector space. Abstractly we make the following 
definition: 


Let V be a set in which there is defined an addition satisfying the laws 
(axioms) (Al) to (A6); let S be a skew field, and let there be defined an 
S-multiplication satisfying the laws (M1) to (M6). Then V is called a vector 
space over S, and its elements are called vectors. 


Thus, V is a commutative group with respect to addition. The skew field S$ 
is a skew subfield (up to isomorphism) of the ring of endomorphisms of this 
group. For if to each element s of S we assign a mapping o:x—x~- 5s, then 
(M4) means that « is an endomorphism, (M3) states that the product of two 
endomorphisms is defined by successive application of the mapping, and 
(M5) is the usual definition of the sum of two endomorphisms. (See IB1, §2.4 
or B. L. van der Waerden [2], page 148). Consequently, from a given 
commutative group we can construct a vector space by selecting a skew field 
from the ring of endomorphisms of the group. 


Consequences from the axioms of a vector space are: 
1) If 0 is the zero element of S, then x - 0 = o for every x. 
Proof: Since 
x:O0=%x:(040)=2%-:04 2:0, 
therefore, by (A5’), 


x-O=2x-:0—x:'0=0. 
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2) For every s in S we have 0 - s = o. The proof is analogous to that 
of (1). 
3) Conversely, if x - s = o, then s = 0 or x = o. 


Proof. Let x-s = o and s 0. Then 

x-l =x-s-(I/s) =0- (l/s) = o. 
4) —x = x-(—1). 
Proof. x+ x-(—l)=z:(Ul—l=o. 


Often (for example, in IB6, §8) we speak of a vector space over S even when S 
is not a skew field but only a ring (with unit element). But then, besides 
(Al to A6) and (M1 to M6) it is necessary to postulate the existence of a basis 
(see §1.3); for if S is not a skew field, it is no longer possible (as in §1.3) to 
deduce the existence of a basis from (Al to A6) and (M1 to M6). 


1.3. Linear Dependence. Basis 


Addition and S-multiplication can be combined into the concept of a 
‘linear combination.”’ A vector c is said to be a linear combination of the 
vectors a, ,...,@,, Or to be linearly dependent on these vectors, if there 
exist cl, ...,c” in S such that 


n 
¢=ayc! + + a,c" = ) a,c’. 


v=1 


With regard to the notation, we shall usually omit the sign }° by agreeing 
once and for all (with Einstein) that summation is to be taken over equal Greek 
superscripts or Greek subscripts. Of course, it must be clear from the context 
what values the indices are to assume. 

We use Greek letters (as variables) for the indices when we mean that all 
possible values of these indices are to be assumed successively (in the language 
of logic, a Greek letter denotes a bound index variable; in our notation for 
n-tuples the binding can be expressed by (x”)’"!'::"). If we are referring to a 
definite value of the index, we use a Latin letter. Consequently, summation 
is not to be taken over Latin indices. 


Thus, a vector space over S is a set in which it is always possible to form 
linear combinations of its elements with the elements of S. It is to this 
characteristic feature that the concept “‘vector space’’ owes its importance. 


Preliminary discussion. Vf e,,...,@n, are given vectors, then the set of 
linear combinations x = e,x’, x»¢ S, forms a vector space, as is easily 
Shown. Conversely, we shall obtain a good picture of a given vector space 
V if we succeed in finding a set (presumably finite) of vectors e,, ..., en 
such that every vector x in V is representable (as far as possible, uniquely) 
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as a linear combination x = e,x’, and we shall see that in fact this is 
always possible. But then the vector is determined by the n-tuple (x’), so 
that the vector space V is characterized by the n-tuples of S. Thus a set of 
n-tuples is not only a particular example of a vector space, but every 
vector space is representable as the set of n-tuples in S; the various vector 
spaces over S are distinguished from one another only by the number zn. 

Development. To carry out these ideas precisely we shall need the 
following concepts: the vectors a, , ..., a, are said to be linearly dependent 
if there exist scalars c}, ..., c”, not all 0, such that a,c’ = 0; but if a,c’ = o 
implies that all c» = 0, the given vectors are said to be /inearly independent. 

The following theorems are immediate consequences of the definition: 


Theorem la: if for some i(1 <i <n) we have a; = 0, then a,, ..., A, 
are linearly dependent. 


Theorem 1b: if the vectors a,,...,a, are linearly dependent, then so 
are the vectors Ay, ..., Un» Un41 5 ++) Angp + 


Theorem Ic: if the vectors a1, ..., Qn, Qn415 ++) Qn4» are linearly 
independent, then so are the vectors 1, ..., Qn. 


Note. The order of the vectors has no effect on linear dependence. 


The system e, , ..., ¢, is called a basis of V if for every vector ain V there 
exist n elements a’ in S such that 


1) a = e,a’ (reminder to the reader: summation over v is from | to n), 
2) a = ea’ = e,b’ implies a’ = b’ for all v; in other words, the 
representation is unique. 


The a’ are called the coordinates of a with respect to the basis 
(e,). Condition (2) is equivalent to 


2’) e,,..., &, are linearly independent. 


Proof. (a) Assume (2) and e,c’ = o. Since >, e, :0 = 0, it follows from 
(2) that c” = 0 for all v. , 


(b) assume (2’) and ea” = e,b’. 


Then it follows that e,(a” — 5”) = 0, so that by (2’) we have a” — b» = 0 
for all v. 

A vector space can have various bases; but we shall see that the number 
(n) of basis elements is always the same. This number is called the dimension 
of the vector space, which we then denote by YV,, . 

The above assertion follows easily from the next theorem. 
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Theorem 2: Ifthe n+ 1 vectors a,, ..., G,,1 are linear combinations of 
N VECLOIS Cy, +5 On 
“1. = e,a,.”, v= I, veey My K >= 1, veo A te I, 
then ay, ..-) Un41 are linearly dependent. 


The proof is by complete induction. 


Initial step: for n = 1 we have a, = ea,1, ag = e,a,!. If a,t = 0 or 
a,! = 0, then the assertion is correct by Theorem 1a. Otherwise, 


»(a.d-1 = 
ay * (a,")"* — a,(a,")-? = 0. 
Completion of the induction: let 
Qy = ya," + egay® + ++ + nay” 
Ag = ea} + Coy” + ied + eC, Aq” 
= 1 2 vee n 
Anat — O19n st 7 CF 41 — oe Cnn +1 * 


If all ayy = 0, the assertion is correct by Theorem la. We may assume 
a,;! ~ 0, since for a,1 = 0, a," #0 the proof is exactly the same. Then 
the n vectors 


b .=a a 4° (a,5)-! « al 


are linear combinations of the n — I vectors e,,..., @, , so that they are 
linearly independent by the induction hypothesis; that is, there exist 
x*, ..., x"t2e,8, not all = 0, such that 


bx? + eee + Bagax _ Oo. 
Substitution of the above expressions for the b, gives 


where at least one of the coefficients of ag, ..., Gn41 is 40. 
From Theorem 2 follows 


Theorem 3: if V has a basis of n elements ¢,, ..., ¢,, then: 


a) Every n + 1 elements of V are linearly dependent. 
b) Consequently, no basis of V has more than n elements. 


c) No basis can consist of fewer than n elements; for otherwise ¢,, ..., en 
would be linearly dependent. 
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d) Every n linearly independent vectors v,, ..., 0, of V form a basis. For 
if a is an arbitrary element of V, then by (a) the vectors v,,..., v, , a are 
linearly dependent; that is, there exist a’ (v = 0,1,...,7”), not all = 0, 
such that 


aa® + va + + + va” = 0. 


Here a® + 0, since otherwise v,,..., 0, would be linearly independent, 
and therefore a is representable as a linear combination of v,, ..., 0, . 

Thus the dimension n of V can also be characterized by the fact that 
there exist n linearly independent vectors in V and every n + 1 vectors are 
linearly dependent. \f and only if it has this property, does the vector 
space V have a basis of n elements. But this characterization of dimension is 
independent of the choice of basis. 


The concept of a ‘basis’ was first introduced by Dedekind for modules 
(supplement XI to Dirichlet’s Zahlentheorie, 3rd ed., 1879, §165) and was 
used for ideals. He did not require that the representation be unique, and he 
allowed the domain of coefficients to be a ring. The concept of a “basis” in 
this sense occurs in Chapter 5, §3.2. 


The dimension of a vector space over § determines the vector space 
uniquely up to isomorphism; in other words, we have the isomorphism 
theorem: two vector spaces V, V' over the same domain of scalars S are 
isomorphic if they have the same dimension. 

The proof is almost trivial when we consider what is to be proved. 
Vector spaces V and V’ are said to be isomorphic if there exists a one-to-one 
mapping of V onto V’ (x — x’) with the property that 


(x + y)’ = x41)’ and Gs) Sx +S: 


But if c,,...,¢, is a basis of V, and ¢,,...,¢, is a basis of V’, then 
¢,x” > ¢,x” is a mapping with the required properties. 

In analysis, an important role is played by vector spaces of infinitely 
many dimensions; for example, the real functions that can be represented 
as a Fourier polynomial in the interval (—7z, +7) form a vector space 
over the field of real numbers; for this vector space the functions 1, 
cos (vx), sin (vx), (v = 1,...) are a basis. More generally, the representation 
of functions in an orthogonal series may be regarded as a representation 
of the vector space formed by the functions (cf. III, 11). But in the present 
chapter we consider only vector spaces of finite dimension. (For the existence 
of a basis in a vector space that is not necessarily finite-dimensional, see 
IB11, §3, Theorem 2.) 

When we consider the sets of n-tuples of elements of S as a vector space, 
as we did in §1.2, we are representing the space in terms of the special 
basis ¢, = (1, 0,...,0), c. = (0, 1, 0,..., 0)..., e, = (0, 0,..., 0, 1). 

It is easy to see that S itself is a vector space of dimension I over S. 
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1.4. Vector Subspaces 


A subset U of a vector space V over S is called a vector subspace if U 
itself is a vector space over S and if the addition and the S-product in U 
are the same as in V. 

If U is a vector subspace of V, we have: 


1) ifae Uand beU, thena+ beU: 
2) fae UandseS, thena:seU. 


These two conditions are also sufficient for a non-empty subset U to be 
a vector subspace of V. To prove the sufficiency, we must verify the laws 
(A), (M) for U. The associative law 


(+ y)+3=zx+() +3) 


holds for arbitrary elements of V, and thus in particular when x, 1), 3 are 
elements of U. The same remark holds for all those laws in which the words 
“there exists’? do not occur. The existence of the sum and the S-product 
in U is guaranteed by (1) and (2). It remains to verify (AS) and (A6). As 
for (A5), for an arbitrary x € U we have x - 0 € Uand from§81.2, corollary 1, 
it follows that x -0 = o. As for (A6), if x is an element of U, then so is 
x * (—1) (see §1.2, corollary 4). 

The intersection of two or arbitrarily many vector subspaces is again 
a vector subspace. For the proof we need only verify (1), (2). The 
necessary definitions are: 


xeU,OU, if and only if xeU, and xe U,. 


If Mi is a non-empty set of vector subspaces, then x ¢ (\uem U if and 
only if xe U for all Ve Mt. 

If a, ,..., a, are arbitrarily given elements of V,, , the smallest subspace 
U of V,, containing a,, ..., a, is called the subspace spanned or generated 
by a, ,...,a;. The word “smallest”? here means that U is the intersection 
of all subspaces containing a,,...,a,, so that the existence of U is 
guaranteed. Of course, it may happen that U = V. 

A basis for the subspace spanned by the vectors @,,..., a, can be found 
by writing the a, ,..., a, in any order and then striking out every a, that 
is linearly dependent on its predecessors. 

To prove this statement we enumerate the vectors in such a way that each 


of the first / vectors a, , ..., a, is linearly independent of its predecessors 
but the vectors a,,,,..., a, are linearly dependent on a,,...,a,, and then 
show that 

1) a, ,..., a, are linearly independent. 


The proof is by induction on /. For / = 1 the statement is correct, since 
we may assume that a, + 0. We now assume that a, , ..., a,_, are linearly 
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independent. If a,,...,a, were linearly dependent, there would exist 
scalars x},...,x', not all 0, such that a,x! + --- + a,x! = 0. If x' = 0, 
then a,, ..., a@,;_; would be linearly dependent, and if x’ 0, then a, would 
be linearly dependent on a,, ..., a;_1. 

2) Every subspace U, even a smallest one, which contains a,, ..., ay, 
certainly contains a, , ..., a, and all linear combinations a,x! + --- + a,x". 
But the totality of these linear combinations is a subspace containing 
G1, -.-, @,, and is thus the desired subspace. 

The basis a, , ..., a; of a subspace can be extended to a basis of V,, . For, 
either the a,,..., a, already span V, or else there exists a vector a,,,, 
linearly independent of them, that can then be adjoined as a further basis 
vector for V,,. Since the dimension of V is finite, this procedure comes to 
an end after finitely many steps. 


1.5. Change to a New Basis 


After choice of a basis the vectors in VY, can be represented as n-tuples from S. 
In the applications where it is natural to choose some special basis, the vectors 
are often given in this way. 

Thus it is important to determine how the coordinates of a vector behave 
under a change of basis or, as is often said, of coordinate system. 


Let e,,..., ¢, and ey’, ..., &,’ be two bases of the vector space V,, , where 
for greater uniformity in our subsequent notation the primes have been 
written not on the e but on the indices. Thus 1’, ..., 2’ are simply marks to 
identify the elements of the second (the “primed’’) basis; of ‘course, there 
are n of them. 

Every vector of one basis can be expressed as a linear combination of 
the vectors of the other basis. The coordinates of the basis vectors will be 
denoted by ¢* and t#’, respectively (later also, cf. §2.3, by s% and st’ 
respectively): 


(la) a Ae 
(15) e, = eth. 


The ¢ with primed superscripts and the ¢ with primed subscripts are to be 
carefully distinguished. They are defined in completely different ways. 


The purpose of using the same letter ¢ in these two cases is to make the 
equations for transformation of coordinates easy to remember. One only needs 
to recall that summation is taken over any index appearing both as superscript 
and subscript and that the indices over which summation is not taken occur 
either only as subscripts or only as superscripts. It was for this reason that we 
put the primes on the indices rather than on the e. 
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The relations between the t* and the r¥’ are found by substituting (1a) 
into (15): 


e, = efit 


(summation for « is from 1 to n, and for p’ it is from 1’ ton’). 
Since the e, are linearly independent, it follows that 


a eee for k=, 
ea) CE OS top dese ah 
In the same way, by substitution of (15) into (la) we obtain 


a ee | for po =v’, 
(2b) tt t, Soe, du ay 10 for pe fF v’. 


The Kronecker symbols 8 with two indices, which may also occur either 
both as superscripts or both as subscripts, will always be used in this sense. 

The equations (2) show that not every arbitrary system of n - n elements 
t, of S can occur as the schema of coefficients for a transformation of 
basis: corresponding to a system ¢% there must exist a second system th” 
such that equations (2) are satisfied. But this condition is also sufficient. 
For we assert that, if for the system t® there exists a system ¢#’ such that 
the equations (2) are satisfied, and if e,,...,¢, is a basis of V,, then 
the vectors e,, = e,t” also form a basis of V,,. 

For the proof of this assertion, it is only necessary, since vectors of V,, 
form a basis, to show that the e, are linearly independent; that is, that 
cc” = o implies c,, = 0 for all v’. 

Now we have e,’c”” = e,tc’”. Since the e, are linearly independent, the 
right-hand can be =o only if t%-c’’ = 0 for every «. Multiplying the 
«th of these equations with t#’ (where y’ is arbitrary) and summing, we 
obtain: 


te’ te,cr’ = 0, 


From (25) it follows that c#’ = 0 and, of course, this procedure can be 
carried out for every p’. 


The reader is advised to make the computation for a few simple examples 
(say for n = 2). In the case m = 2 the determination of the ¢\’ for a given ff, 
requires the solution of four linear equations which for commutative S can be 
solved if and only if ¢.22, — the, 4 0. 


Let us now ask how the coordinates of a vector x are transformed under 
the change of basis (1). Let the coordinates of x with respect to the basis 
(c,) be x’, and with respect to the basis (e,”) let them be x’, so that we have 


(3) X= e,x4 = ex". 
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Substitution of (15) in (3) gives 
Cf Se x, 
mn v 
so that, since the c,- are linearly independent, 
(4a) xv = x), 
The solution of this system of equations for the x’ is obtained either by 
substituting (la) in (3) or by multiplying (4a) with ¢ and using (2): 
(4b) x = 1X", 
Here the x* and x” are coordinates of the same vector with respect to 
different bases. The ‘‘kernel letter’ x denotes the (fixed) vector, and the 


change of basis is expressed in the index. This ‘“‘kernel-index notation’’ is 
due to Schouten. 


2. Linear Transformations of Vector Spaces 


2.1. General Properties of Linear Transformations 

In the equations §1.5 (4) we could also interpret x* and x” as coordinates 
of different vectors with respect to the same basis. In the kernel-index 
notation such a transformation has the form 


(1) ue == akx’s 


that is, we use different kernel letters for the two vectors and do not use 
any primed indices. 

By (1) there is assigned to each vector x exactly one vector u, which we 
denote by u = Ax (A is to be read: alpha) and 


(Lv) from x=y follows Ax = Ay. 


Thus we have a mapping or transformation of V, into itself, with the 
following properties: 


(La) A(z + y) = Ax + Ay 
(Lm) A(x +s) = (Ax) °s. 


Transformations with these properties are said to be Jinear. They are 
simply the homorphisms of V,, . Since it will be necessary for us to consider 
them in various forms, we will in general take A,B,... to be linear 
transformations of V,,(S) into a V,(S). Thus the dimensions of the two 
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vector spaces are not required to be the same. We shall be particularly 
interested in the cases V;, = V,, and 7, — V, = S, and when it is desirable 
to indicate the dimensions, we shall speak of an ” x k transformation. 

We do not assume that every vector in V;, is the image of a vector in V,, . 
The set of vectors in /, which are images of vectors in V,, is denoted by 
A(V,,) and is called the image domain (cf. IA, §8.4) or the image space 
of the transformation A. We first note that A(V,,) is a vector space over S 
and is thus a vector subspace of /, . 

To prove this we must show that ¥¢ A(V,), }€ A(V,), s€S implies 


(1) + He A(Y,), (2) ¥sc A(V,). 


We outline the proof for (2) and leave the proof of (1) to the reader. Since 
¥ ¢ A(V,,) means that there exists an x € V, such that Ax == %, it follows 
from (Lm) that ¥s = (Ax) 5 = A(xs) € A(V,,). 

The dimension r of the image space of A(V,) = V, is called the rank of 
the transformation A. \t is obvious that r <n. 

We now turn our attention to the set of n x k transformations them- 
selves. In this set we can introduce an algebraic structure as follows: 
two transformations are said to be equal, A = B, if the equation Ax = Bx 
holds (cf. IA, §8.4) for all x in V,, . This equality is reflexive, symmetric, and 
transitive. 

An addition is defined by 


(2) (A + B)x = Ax + Bz, 
and an S-multiplication on the left by 
(3) (s:- A)x = 5: (Az). 


For this purpose it is necessary that 7, be not only a right-space but also 
a left-space over S. This situation certainly holds if Vv, = S; and if S is 
commutative (cf. §1.2), it can easily be brought about by defining sx == xs. 
Our applications will be confined to these two cases. 


Theorem |: With respect to these operations then x k transformations 
form a vector space L," of dimension n - k. 

What we must prove is: 

1) A+B is a linear transformation. To show this we must verify 
(Lv), (La), and (Lm). As an example, we shall give the proof for (La), 
leaving the other proofs here and below to the reader. 

(A + BY + y) = Ae +) + B+ y) by (2) 
= Ax + Ay + Bz + By _ by (La), applied to A and B, 
= (A+ B)x+(A+B)z 
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on the basis of the associative and commutative law for addition in V7, 
and (2). 

2) The sum A + B satisfies the condition (Al-A6). As an example, let 
us prove (A2). The assertion is that if for every x we have Ax = Bz and 
[x = Ax, then for every x it follows that 


(A+T)z= (B+ A)z. 
But 
(A+T)x = Ax+Tx by (2) 
— Bz-+ Ax by (A2), applied in 7, , 
—(B+A)z by (2). 


The zero element is the transformation Ox = 6, where 6 is the zero element 
of V,. 

3) sA is a linear transformation. 

4) The S-multiplication satisfies (M1—M6), but with left s-factors. 


The theorem further states that the dimension of ZL depends on the 
dimensions of V and V. To prove this statement we shall require a 
representation of the transformations in terms of a basis (e,) of V, and 
(é,) of V, . Therefore we shall postpone this proof until we have completed 
our discussion of coordinate-free theorems. 

The multiplication of one transformation by another is defined in the 
usual way as successive application of the two transformations: 


(4) (AB) x = A(Bz). 


Here B is a transformation of V,, into V;,, and A is a transformation of 
V,, into a new vector space W, , and in the product of several transforma- 
tions additional vector spaces are introduced in the same way. It is 
permitted, but not required, that these vector spaces be distinct from one 
another. 

The product of two transformations is defined only if the image space 
of the first transformation is contained in the preimage space of the 
second. Moreover we have defined the sum of two transformations: only 
for transformations from the same V,, into the same /, . But so far as the 
sums and products exist, they satisfy the conditions for a ring. Beside those 
already stated for addition, these conditions are: 

5) Consistency of multiplication with equality: if for all x we have 
Ax = Bz and Iz — Ax, then for all x we also have Alx = BAz. 


Proof. By hypothesis [x = Ax. Thus it follows from (Lv) that 
A(x) = A(Ax) and then from Ax = Bz that A(Az) = B(Az) for all z, 
so that the assertion is true by transitivity of equality. 
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6) The associative law: 
A(BI) = (AB) TI. 


This law holds generally for transformations with successive application 
as the rule for multiplication (see IB2, §1.2.5.). 


7) The distributive laws: 
A(B + T) = AB+ AT; (A + B)F = AT + BF. 


The proofs always depend on the same fundamental idea. 
The existence of sum and product is in every case guaranteed if we 
consider only the set of linear transformations of V,, into itself: 


Theorem 2: The linear transformations of a vector space V, into 
itself form a ring with respect to the addition (2) and the multiplication (4). 


Corollary: This ring has a unit element (for rings, also called unity 
element), namely the transformation E with Ex = x. (For the concept 
of a ring see IB1, §2.4, and IBS, §1.2.) 

Under what circumstances does there exist, for a transformation with 
A(V,) = VC V,, an inverse transformation A such that A(V,) = V,? 
Since the dimension of A(V,) is smaller than or equal to r, and on the 
other hand r <n, we must have r = n. 

But this necessary condition is also sufficient, as can be shown, for 
example, in the following way: Let (e,) be a basis of V,, , so that the images 
Ae, = G, form a basis of V,,. The desired inverse transformation is then 
given by Aad, = e,, as the reader may easily show. (The argument is 
similar to the one at the beginning of §2.2; cf. also the end of §2.3.) 

By the definition of A it follows that AA = E, so that A is a left inverse 
of A. But if we start from the basis (a,) in 7,, , we see that AA = E, so that 
A is also a right inverse of A. 

But A is uniquely determined by A: for if AA = E and BA = E, then 
by multiplying the second equation on the right with A we see that A = B. 
Thus we may call A the inverse mapping for A, and denote it by A-!. The 
transformation A-? itself has an inverse (since it is of rank 7) and in fact 
(A-)-1 = A, 

Thus, @ transformation A is invertible (i.e., has an inverse) if and only if 
its rank, i.e., the dimension of the image space, is equal to the dimension of 
the original space. 


2.2. Matrices 


Let us now consider the representation of linear transformations in 
terms of a basis (e,) of V, and a basis (é,) of 7. From (La) and (Lm) it 
follows that 


Ax = A(e,x’) = (Ae,) x’. 
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Consequently, after choice of a basis (e,) of V,, the transformation A is 
completely determined if the images of basis elements 
Ae, = a 


v v 


are given. 
In terms of a basis of 7, we have a, = é,a,*. 


After choice of a basis (e,) of V,, and a basis (é,) of V,, the transformation 
A is completely determined by the “rectangular array” of n -k elements of S 


a}, eres ay} 
= Pa =1,,...k 
») ay Serer ear _ (a rare 


k k 
ay 9 eeg an 


An array of this sort is called an n x k (n by k) matrix. 

Here the superscript outside the bracket indicates the row, and the 
subscript indicates the column. The notation (4,,),01,.. .kve1,....n for 
matrices is also common; in this case the first index indicates the row. 
If the position and range of values of «x, v are clear, we write W = (a,*). 

The expression “‘rectangular array’ means simply that to every pair of 
numbers («, v) there is assigned an element a,* of the domain of scalars. 
Thus the matrix is a mapping of the pairs of numbers into the domain of 
scalars. By the general definition for equality of mappings (see IA, §8.4) 
we thus have 


(1) W= 8 ifandonlyif a =565,« forall «,v. 
If we set Ax = yn = €,y*, then 
ey" = Ar = a,x” = é,.a,*x”, 


Since the ¢, are linearly independent, the transformation can also be 
represented by the system of equations 


(2) pee? Bier oe 


It was from such a system that we started out in the first place, and now 
we have shown that every linear transformation of V,, can be represented 
in this way. 

Let us now examine the question of uniqueness. Let 2 = (a,*) and 
% = (b,*) be the matrices assigned in a given coordinate system to the 
transformations A,B. It is obvious that if a,“ = 6,* for all x, v, then 
Ax = Bx for all x; that is, A = B. 

Conversely, for a pair of indices /, jassume a;? ~ 5,’; then there certainly 
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exists a vector x for which Az ~ Bz, for example, the vector e, with 
coordinates 5,”. By (2) its image vectors have the coordinates 


y* = a,*6y = a;* and z* = 5,6, = b,*, 


which differ from one another in the ith coordinate. 

Thus A = B if and only if a,* = 6,* for all «, v. Then (1) shows that 
A = Bif and only if WU = 8. 

For matrices we now wish to introduce the rules of computation which 
will correspond to those already introduced for transformations in such 
a way that the resulting ‘“‘configurations”’ (cf. IB10, §1.2) for the matrices 
are isomorphic to the corresponding configurations for the trans- 
formations. 

So we ask: which matrix will correspond to the sum A + B = [? By 
§2.1 (2) we have [x = Az + Bz, so that, if matrices are denoted by the 
corresponding letters: 


CXx” = a,Xx” + byxx” = (a, + BX) x’. 


This system of equations is satisfied for all possible vectors (x’) if and 
only if 


ef ar tbe for all x, v. 
Thus we define the addition of two matrices by 
(3) (a,") + (6,5) = a" + 4,*). 


By an analogous argument we are led to define /eft-multiplication of a 
matrix by an element of S by setting 


(4) 5° (a,*) = (s°4,*). 


Then the n x k matrices form a vector space isomorphic to L,,*, whose 
zero element is given by the matrix © with a,« = 0 for all «, v. For this 
vector space it is easy to assign a basis, namely, the matrices ©; with 1 
in the position x;/ (note that the indices are interchanged) and zero 
elsewhere. Then every matrix 2{ can be represented in the form 


W = a*) = a&, 


where the ©; are linearly independent, since (1) implies that 
a,“€,” = (a,*) = © if and only if a,* = 0. Thus the remaining part of 
Theorem 1, L,* has dimension n - k, is proved if we show that isomorphic 


vector spaces have the same dimension, a detail which we leave to the 
reader. 
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~ 


It is also possible to show without the use of matrices that the trans- 
formations E,’ (corresponding to the matrices €,*) form a basis, where the 
E,* are defined by 


Eve; = @,, E/e, = 6 for [fi; 
note that summation is not taken over the Latin index 7. 


Matrix multiplication: the transformation A(Bz) is represented in the 
matrix notation by y* = a,*(b,*x’). Thus the matrix product AUB = C€ 
is to be defined by 


(5) (c,") = (4,"b,"). 


The element c,’ is formed by multiplying the elements of the Ath row of UW 
(as left factors) with those of the vth column of 8 and adding the products. 
Thus the product exists only if the number of elements in a row (i.e., the 
number of columns) of Q is the same as the number of elements in a 
column (i.e., the number of rows) of 8. 

The isomorphism between matrices and transformations shows that 
‘the same rules of calculation hold here as for transformations. 

The following example shows that multiplication of matrices, and 
therefore multiplication of transformations, is not commutative: 


0, ly 0,-l, =/-l, 0 
(1 ole od = Co a) 
0-~—-l; 0 l= l, oO 
(1) oto) =( 0-1) 
But calculation with matrices can be also defined by (1), (3), (4), (5) 
independently of the linear transformations, whereupon the rules for 
calculation can be verified by direct computation. Then the right of (2) 


can be interpreted as a product of matrices, where (x”) = x and (y") =6 
are matrices consisting of a single column. In place of (2) we then write 


(2’) j = Ux, 


Thus the equation §1.5, (4a) would be written in the form x’ = Tx. Here 
of course the kernel-index notation must be given up. . 

Since the matrices corresponding to transformations of V,, into itself 
are the square n X n matrices, they form a ring, a fact which again can be 
verified by actual computation without any reference to the theory of 
linear transformations. The unit element in this ring is the unit matrix 


1, 0,...,0 
OF. Ag 0 


. 


E= (6) = 
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If the transformation A has an inverse transformation A, then the matrix 
has an inverse matrix YW, which by the remarks at the end of §2.1 is seen 
to be both a left and a right inverse. A matrix Qf of this sort, for which 
there exists an & with UM = Wa = G, is called regular. Comparison with 
the equations §1.5, (2) shows that a given matrix can be the matrix of 
coefficients of a transformation of basis if and only if it is regular. 

Not every matrix is regular; for example, 


l, 0 u, v\ (u,v 
(0 oll, ») = 0) 
is certainly different from the unit matrix, no matter how the u, v, x, y are 


chosen. 
If we set u = v = 0 and say x = y = 1, we see that the matrices 


fs a] and i °) are divisors of zero. (Their product is zero, but neither 


factor is equal to zero.) 


2.3. Rank and Transformation of Basis for Matrices 


Now that we have laid the foundations for calculation with matrices, 
let us discuss the following question: 


1) How is the rank of the transformation indicated in the matrix ? We 
return to the beginning of §2.2. Every vector of V,, is a linear combination 
of the e,, so that every vector of VY, is a linear combination of the 4, , 
which means that /, is spanned by the a, . Thus by §1.4 a suitably chosen 
set of r of the vectors a, forms a basis of /, ; in other words: among the 
d, there exist r linearly independent vectors and every r+ 1 of these 
vectors are linearly dependent. Now the coordinates of the 4, form the 
columns of the matrix 2f. The maximal number of linearly independent 
column vectors is called the column rank of 2, so that we have the theorem: 
the column rank of the matrix U is equal to the rank of the transformation A. 


2) How does the matrix % representing a transformation A change with 
a change of basis ? 

Let the transformation A be represented with respect to the bases 
(e,), (€.) by the matrix 2f = (a,*), and let the image of the vector x be 


Ax = pn. 


(We write » here instead of % in order to avoid having too many diacritical 
marks on the same letter.) 


By §2.1, (1) the coordinates of this image vector are 


(1) y* = a,*x", or in matrix notation: » = Ux. 
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Here we consider a vector as a matrix with one column, whose elements are 
the coordinates of the vector with respect to the given basis. 


We now make a transformation of basis in the two vector spaces: 
xv = th xe or x = Ix 
ye = sty® or y = Sy. 

Then from (1), by multiplication with (s°) = G-1, we obtain 

yr = sare xe ory! = SAMNTy’, 


Thus, the transformation A is now represented, in terms of the new bases, by 
the matrix 


(2) 9 = 6-8, 
where S and & are regular matrices. 


Definition: two matrices YI, 2’ that stand in the relation (2) to each 
other are said to be equivalent: I ~ W’. 

Thus we have the result: if two matrices represent the same transforma- 
tion, they are equivalent to each other. The converse is also true; let us 
state it in detail: if 2( and Ql’ are equivalent matrices and if Wf represents 
the transformation A with respect to the bases (e,), (é,), then there exist 
bases (c,), (€,) with respect to which the matrix 2’ represents the trans- 
formation A. 

It follows that equivalence of matrices is reflexive, symmetric and 
transitive, and is thus an equivalence relation (JA, §8.3 and 5), as can also 
be shown by actual computation from (2). 


3) Is it possible, by a suitable choice of bases, to represent a given 
transformation in a particularly illuminating way ? 

We had Ac, = 4,. By reindexing (if necessary) we can arrange that the 
G,,...,d, are linearly independent and that 4,,,,...,a, are linearly 
dependent on these first r vectors. Thus by §1.4 we can choose a basis of 
V,, such that é, = a, ,..., & = a, (and of course é,,,,..., & are linearly 
dependent on them). Then for all G, the coordinates a,“ = 0 for k > Tr; 
thus 2 has the form 


0 0 60 0 


* © © © © 8 © 8 © 8 © © 6 
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We now introduce a new basis for V, as follows: if a,,, 40, we set 


Corsay = Cry — Oy QJ e, =e, for v~r+1. 
Then 
Ge 41)" = Ae. 41 = Gia eof G,(a) 7? 
and thus 
G41) = 9. 


It is clear that repetition of this procedure must lead to a new basis in V,, 
with respect to which the transformation A is represented by the following 
matrix: 


ld 
] 0 
0 
€,= 4 0 | ! r; 
0 i 0 


Thus we have obtained the theorem: every matrix of rank r is equivalent 
to the matrix ©, . Consequently, if two matrices have the same rank, they 
are equivalent. Conversely, it is obvious that equivalent matrices have the 
same rank. Thus equivalent matrices are characterized by their rank alone. 


In saying “rank” here instead of ‘‘column rank’ we are anticipating a result 
at the end of §2.4. 


With respect to the new bases the transformation A is represented by 
the following equations: 


(3) yi a x as = x", yo = 0, icye = Q, 


If necessary, we may also consider a transformation of V, into V,, by means 
of é; > e,,..., &, > e, and then say (for the moment, simply for the sake 
of visualization) that the transformation represented by (3) is a projection. 
Then every linear transformation of a vector space V,, into a vector space 
is a projection of V,, onto an r-dimensional vector subspace, where r is the 
rank of A. Conversely, it can be shown immediately that every projection 
is a linear transformation. 

From (3) we see at once what was already proved in §2.1: if r = n, then 
the transformation is invertible. 


2.4. Systems of Linear Equations 


In the system of equations 


(I) SGP Sk) 


v=l 
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we consider, for every v, the k-tuples (q,}, ..., a,*) = a, and (51, ..., b*) = b 
as vectors in a space V,,. In other words, for these k-tuples we introduce 
addition and S-multiplication as in §1.2. Then (I) can be written: 


(I’) a,x’ = b, 


and the question of solutions becomes simply whether, and in how many 
ways, the vector b can be represented as a linear combination of the vectors 
a, . The solutions, i.e., the n-tuples (x’), can be regarded in their turn as 
vectors of a (different) linear vector space W,, . 


The following preliminary discussion is expressed in geometric terms, the 
Q,, ..-, Qn , b being thought of as vectors in a k-dimensional affine space. In this 
case, however, the W, cannot be interpreted geometrically. 

If thea, , ..., a, already span the whole V; , then every b € V,. can be represented 
as a linear combination of the a,. If the a, form a basis, i.e., if n = k, this 
representation is unique, but if 2 > k, then it may be possible to choose a basis 
for V, in various ways from the a,,...,d@,, so that the representation of b 
will no longer be unique. 

If the a,,...,a, span a proper subspace V, < V,, 1 <k, then only those b- 
are representable that belong to this subspace. Thus the possibility of a solution 
and the total number of solutions will depend on the dimension / of the subspace 
V; spanned by the a,,..., Qn. 


We now repeat the theorems in §1: 
If = = (x’), 9 = ()”) are solutions of (I), then x — y is a solution of 


n 
(IL) >) ax’ = Or a,x” = 0. 

ves] 
Thus we can find all solutions of (1) by adding to a particular solution of (1) 
all solutions of (II). 

If x, 9 are solutions of (Il), then x + y and x-s(s€S), are also solutions 
of (ID). 

Thus the solutions of (II) form a vector space W’ which is a subspace of 
W,,. The question of the number of solutions of (II), and thus of (I), 
becomes a question of the dimension of W’. By the above preliminary 
discussion this dimension will depend on the dimension / of V;, so that 
we must now bring this latter space into play. 

We assume that the a,,...,a, are linearly independent and that 
141, +, Q, are linearly dependent on them. Of course, this assumption 
requires a reindexing of the a, and the x’, which can easily be reversed 
at the end of the calculation. Thus we assume that 


= 1 wee U 
Qiiy = A4C14. + + aC, 


An = Cyt + crip + QiCy!. 
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If these equations are substituted into (II), we obtain 


U 


(1) Yay (x' + y ctxt) = 0. 


A=l1 pe=l+l1 


Since the a, (A = 1,...,) are linearly independent, the system (1), and 
with it (II), is satisfied if and only if for every A the system 


(2) x+ YS ¢,'xe = 0 


pal+l 


is satisfied. But from the latter system we can at once read off all possible 
solutions as follows. If for x'+4, ..., x” we insert arbitrary elements from 
S, the corresponding x, ...,x' can be calculated uniquely from (2). We 
thus obtain as a solution n — / linearly independent vectors z, , ..., X,_1 
and therewith a basis for W’, if we set 


(xitt, xi? ..., x1") = (1, 0, ..., 0) 


Cee EES sien Hg) = Ops, 05-1) 


n-l? “n—l ? 


and then in each case calculate the corresponding x,}, ..., x,” from (2). 
A solution with arbitrary values x!+1,..., x” is obtained as a linear 
combination x,x'+! + x,x'+? + ++ + x, _ 1x”, 

In this way we obtain the theorem: if the vectors a,,...,Q, Span an 
l-dimensional subspace V; © V;, , then the solutions of (Il) form an (n — 1)- 
dimensional vector space W,_,. 


Thus the concept of a vector space is seen to be well adapted to the theory 
of systems of linear equations. 


A basis of W,_, is called a system of fundamental solutions, or more 
briefly a fundamental system. The number / of linearly independent vectors 
among the a,, ..., a, is the column rank of the matrix (a,*). 

But how do we determine the / and the c,*? By §1.4 we must decide 
which of the a, are linearly dependent on their predecessors. Of course, 
this actually means that we must solve systems of linear equations; thus 
we would simply be going around in circles, if it were not possible to 
determine the rank of a system of vectors in some other way. We shall 
return to this question in §3.6, but at present we follow another path: we 
recast the system of equations (I); i.e., we set up another system whose 
solutions are the same as those of (I) but are much easier to perceive. 


258 PART B- ARITHMETIC AND ALGEBRA 


What is wanted, of course, is a system of the form 
xl =¢ 
x2 = ¢? 


x= q'; 
that is, a system whose coefficient matrix has the form ©, . In §2.2 we have 
already spoken about transforming a matrix into such a form; let us now 
examine this question somewhat more closely from our present point of 
view. We must see what such transformations mean for our system of 
equations. 

If a,1 ~ 0, let us subtract a suitable multiple of the first equation from 
the other equations, so that x1 no longer occurs in them, repeating the 
same process with the second equation and with x’, if a,? 0, and so 
forth. In order to have a,1 4 0 we change the order of the equations, if 
necessary, or else change the numbering of the x’. In order to obtain a; 40 
at a later stage, it may sometimes be necessary to take both these steps. 

We now wish to interpret these operations in our vector space. It is a 
matter of showing that they do not change the linear dependence or 
independence of the vectors a, , b. 


1) Renumbering the x, means renumbering the a,, which produces 
no change in the relation of linear dependence. 
2) Moreover, this relation is not changed by a change of basis. For 
example, the transformation of basis (for arbitrary fixed i, /) 
ey =e, ey =e, ey =e, for v i,j 
merely interchanges the ith and jth equations. 
3) Let us now suppose that a,! 4 0 has been brought about by (1) or (2). 


Actually we should here write a}’ or ail; , and in each of the successive steps 
we should adjoin one more prime, but the reader will allow us in each case 
to write only the prime arising from the next step and then, in the final equations, 
to write only one prime on the superscript and on the subscript. 


By the transformation 
ey = e, + €,4,7(a,")~ 
Co = by... 25 Cn’ = Cp 
we obtain 
a, = ey, + ea? — a,°(a,1)* a4) + eya,? +o 
1.e., 


2° 2 __ 2 1-1 71 — 
a’ —a?—a*(a))'a), a =ax for «#2, 


so that in particular a? = 0. 
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With respect to the coefficient matrix UW = (a,*), or to the extended 
matrix 


the operations admitted up to now consist of an interchange of columns, 
an interchange of rows, and the addition of a multiple of one row to 
another row. By repeated application of these rules we can, as a first step, 
bring the matrix QW into a matrix of the form 


ay, . ; 
2 
0, A’ ’ 
y = Ba" Se 
0, . 0, ay’ 9 : 
0, ay ia 0, 
0, 4 0, 


called an echelon matrix. It is characterized by the fact that in the main 
diagonal al, ..., ai, 4 0, whereas to the left of this diagonal and in the 
rows beginning with the (/-+ 1)th there occur only zeros. The other 
elements may be arbitrary. 

From this matrix, or the corresponding system of equations 


ax! + ax? +o + alix™ = 0 

(3) ext + 4 giz” =0 

ax’ +4 giix” =0, 
it is easy to see that the solutions can be calculated for an arbitrary choice 
of x@+)" |, x’. Also, we can at once read off the column rank of the 
transformed matrix (a*,) = QW’, since it is equal to the number of nonzero 
elements in the main diagonal. Since our transformations have made no 
change in linear dependence, this number is also the column rank of QW. 
If we carry out the same operations on the extended matrix 8, leaving 


the column (5*) unchanged, then instead of (3) we have a system of equa- 
tions of the form 


ax+tetax*=b 
ayx' ++ + alx" = 5b! 
QO — pit+n” 


0 = dF, 
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which is solvable if and only if b+)’ = --- = b*’ = 0, In this case the 
rank of 8’, and thus also of 8, is equal to the rank of Qf, since otherwise 
the column of the b*’ would provide, after an interchange of columns, 
a nonzero element in the main diagonal. Thus we have the theorem: 
the system (1) is solvable if and only if the rank of the extended matrix is 
equal to the rank of the coefficient matrix. 


The above proof of this theorem contains at the same time a procedure for 
the numerical solution of a given system of equations which is more convenient 
in practice than, for example, the method of solution by means of determinants. 


It remains to justify our use here of the word “rank’’ instead of ‘“‘column 
rank” of a matrix. Among the admissible operations for the transformation 
of a matrix we have made no mention of the addition of a multiple of one 
column to another column. But here also the number of linearly 
independent column vectors remains unchanged, as is shown by the 
theorem: if a,,..., a, are linearly dependent or linearly independent, then 
the same is true for a, , Ag + 45, Ag, ..., A, . The proof is left to the reader. 

Since it is clear that rows and columns of a matrix are on an equal 
footing with each other, we can also regard the rows as coordinates of 
vectors (in a space other than that of the column vectors), in which case 
the S-multiplication will be on the left. To determine the number of 
linearly independent row vectors we make use of the same operations as 
for the column vectors. Thus the resulting echelon matrix has the same 
number of nonzero elements in the main diagonal. This method shows 
that the row rank of a matrix is equal to the column rank, so that we may 
speak simply of the rank of the matrix. 


2.5. Transformation of a Matrix into Diagonal Form 


By the addition of multiples of columns to other columns we can bring 
the echelon matrix YW’ into the diagonal form 


0 
(with zeros everywhere except on the main diagonal). In fact, exactly this 
process is carried out, in a somewhat indirect way, in the ordinary method 
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of solution of a system of linear equations corresponding to an echelon 
matrix. A more important interpretation of the process is described in 
§2.2. 

Our first remark is that the operations used above to transform the 
matrix 2, namely: 


interchange of two rows or columns, 
addition of a multiple of a row or column to another row, or column, 


can be effected by multiplication of 2{ with suitably chosen regular 
matrices. 

For let U;,, be the matrix that arises from € by interchange of the ith 
and jth rows (or what amounts to the same thing, of the ith and jth 
columns). Then the matrix MU, , arises from 2 by interchange of the /th 
and jth column, and U, ,9f arises by interchange of the ith and jth rows. 

If B,(q) is the matrix that arises from € by adding the element q in 
the position x,‘, then M&B,‘(q) arises from W by addition of the qth 
multiple of the ith column to the jth column, and %,‘(q) UW arises by 
addition of the gth multiple of the jth row to the ith row. 

The matrices Uf, B are square, whereas 2 may be rectangular, in which 
case the matrices used for right multiplication will not have the same 
number of rows as those used for left multiplication. 

The matrices 4, B are regular, since they represent transformations of 
basis. Since the product of regular matrices is regular, we obtain the 
theorem: a matrix U can be transformed by multiplication with suitably 
chosen regular matrices S, X into a diagonal matrix 


D = SUT. 


Then by multiplication with regular matrices D can be further trans- 
formed into &,, by left-multiplication with © and right-multiplication 
with the matrix that arises from © through replacement of the first r 
diagonal element by 1/d,, ..., 1/d, . 

Since the inverse S-! of an arbitrary regular matrix © is regular, we 
have obtained another proof for the theorem: every matrix of rank r is 
equivalent to the matrix ©, . 

This partition into equivalence classes is rather coarse, since the possi- 
bilities for transforming matrices into one another are still extremely 
numerous. In what follows we shall restrict the transformations in various 
ways, examining only the simplest case in detail. 

One restriction consists of regarding the vectors x and n as elements of 
the same space, so that the same transformation is applied to both of them. 
Matrices that are in the relation 


(1) W = TAT 
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are said to be similar, in which case the matrix 2{ must be square. In §3.7 
we shall examine some invariants under similarity. 

A further specialization refers to the admissible transformations. For 
example, the orthogonal transformations (see the following section and 
II.7) are important for geometry. We shall discuss a question of this sort in 
§3.7. 


2.6. Linear Forms 


In our discussion of systems of linear equations we have encountered 
three vector spaces: the space of solutions, the space of column vectors, and 
the space of row vectors. Multiplication with elements of S was on the left 
for row vectors, and on the right for column and solution vectors. Let us 
now examine the relationship between solution vectors and row vectors in 
the case of a single equation, 


Here we regard the x” as coordinates of a vector x in a space V,,. If 
Q,, +--+, G, are given elements of S, the mapping 


(1) B—> Ca, ¥> = ax + + a,x” = a,x” 


assigns to every element x of V,, an element <a, x> of S. This mapping « 
has the properties: 


(Lv) From x=y follows <a,x> = <a, 9), 
(La) <a,¥ + Y> = <a, > + Ca, 9), 
(Lm) <a, 25> = Ca, > °S, 


and is thus a linear mapping of V,, into S, a special case of the mappings 
considered in §2.1. Here k = | and | is regarded as a basis of S (asa V;). 
The matrix corresponding to the mapping « consists of a single row 
(a,1) = (a,); this matrix will also be denoted by the letter a. 


The sign of equality in «a = (a,) and in x = (x’) refers to the representation 
of « and x with respect to a given basis. As a sign of equality that is valid only 
with respect to a given basis, Schouten uses the symbol =. We consider it 
unnecessary to introduce any special symbol in our present context. 


By §2.1 every linear mapping of V,, into S can be represented in the 
form (1). Such mappings are called /inear forms.? They constitute an 


2 In IBS, §3.9, on the other hand, a linear form will mean a linear homogeneous 
polynomial. 
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n-dimensional vector space L, with S as the domain of left multipliers, 
namely, the space of linear forms or the dual space of V,,, also called a 
module of linear forms. 


For vectors as “one-dimensional” matrices we have used lower-case letters; 
sO now in the same way we use lower-case (instead of capital) letters for the 
linear mappings of V,, into S. The notation <a, x> is intended to emphasize the 
symmetry of the two “factors.” 


After choice of a basis (e,) for V,, and é, = | for S, the linear forms e? 
defined by <e?, e,» = 5° constitute a basis for L,. These linear forms 
correspond to the mappings E§ (for k = 1). 

Since the bases of V,, and L, have been put in correspondence with each 
other in this way, a transformation of basis in V,, corresponds to a 
transformation in ZL, . Let us examine the effect of such a transformation 
on the coordinates. 

Let the image of x under the linear mapping « be represented in terms of 
one basis of V,, by (a, > = a,x’, and in terms of another by (a, x> = a,x". 
Then, since the linear transformation is independent of the choice of 
basis, it follows that for a transformation which takes x” into 


(2) x" = 1x, 
the a, must be transformed in such a way that 
ay x” = a,x’. 


By substituting the inverse transformation x” = t/.x»' of (2) into this 
equation we obtain: 


(3) ax = ath xe, 
Thus the transformation 
(4) a,=af'. 


produces the desired result, and no other transformation can do so, as is 
clear from the fact that (3) must hold for every vector x and therefore in 
particular for x! = 1, x? = --+ = x” = 0, and so forth. 

In order to obtain the transformation matrix in (4) from the matrix 
in (2), we must first form the inverse matrix (t},) and then sum over the 
superscripts on the t’s rather than over the subscripts. 

The equations of the transformation (4) are the same as for the basis 
vectors: 


vu 


(5) e,=et., 
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so that the a, are said to transform cogrediently with the basis vectors, 
whereas the x” transform contragrediently; thus the linear forms « = (a,) 
are called covariant vectors and the x = (x’) contravariant. 


The advantage of using superscripts and subscripts is now clear, and the 
convention of summing over equal superscripts and subscripts is convenient 
because it takes an expression involving a covariant and a contravariant vector 
into a magnitude that is invariant (under transformation of coordinates). 


The mapping « — <a, x> may be regarded as a mapping, defined by x, 
of the dual vector space into the domain of scalars. More generally, we 
also consider mappings of V, x V, x -*: or L, X Ly, X °::, and so forth, 
into the domain of scalars. Let us describe the next simplest case. A 
mapping of the set of pairs of contravariant vectors into the domain of 
scalars 


x,y >c = T(z, ) 


is called a bilinear form (cf. page 268) if it is linear in each variable; in 
other words, 


(a, + x2, 0) = I(x, 9) + Pee, 0); 
I(x, 9, + Ye) = I, y,) + I, »,), 
(Lm) I'(sz, 9) = s - I(x, y); I(x, ys) = I(x, y) - s. 


(La) 


Here we assume that sx = xs and that S is commutative, although for 
the time being it would be sufficient to assume that V is both a left and 
right vector space. From (La), (Lm) it follows that forz = x*e,, y = e\py: 


(6) (sz, y) = x*- Te, ea), 
and if we set I(e,., €,) = Za; 
(7) P(e, 9) = XS," 


Under a transformation of coordinates §1.5 (1) the transformed values 
of the g,, will be such that 


Buy = Le, , ¢,) = tg ,t - 


The bilinear form I" is also called a covariant tensor of second order and 
the g,, are its coordinates. 

In general, tensors can be defined as multilinear forms, i.e., as mappings 
of systems of covariant and contravariant vectors into the domain of 
scalars (cf. page 268) linear in each of the variables. For example, 


A(x, «, 9) = a. Xa, yt 
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defines a tensor of the third order which is covariant with respect to two 
indices and contravariant with respect to one. The coordinates of a tensor 
are the coefficients of a multilinear form; for example, under a transforma- 
tion of coordinates we have 


a A’ a 
dey = td} 

Let us now raise the question: do there exist transformations under 
which the coordinates of a covariant vector are transformed in the same 
way as those of a contravariant vector? Under such a transformation 
we must have for every vector x = (z*) not only 


(8) xe = tx 

but also 

(9) ae fe 
a 


If we solve (8) for the x“ and substitute in (9), we obtain 


xe = toe’ = YY tt x", 
we’ A 


By setting (x) = (1,0,..., 0), (1, 1,0, ..., 0), and so forth we obtain the 
conditions of orthogonality 


Yue, =0 for pHa 


and of normality 


2» tte = I, 
or taken together 
(Ol) > tet, = be, 
In the same way 
(O2) Lee, = 8, 


Conversely, let us assume (O) and (8). Then 
x = th xe {solution of (8)], 


d thr = SY thttxe’ = x by (O), so that (9) holds. 
”A Ae spt 
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Thus we have the theorem: for transformations satisfying the conditions 
(O), and only for such transformations, it is unnecessary to distinguish 
between covariant and contravariant vectors. 

These transformations are called orthogonal. 

Let us repeat the argument in matrix notation. For this purpose we must 
first define the transpose UT = (a,”) of a given matrix QW = (a,*), which 
is formed by interchanging rows and columns a,’ = a," in 2. In more 
detailed notation the transpose of the matrix (a, eedoen --*F ig represented by 
Gt. 

We leave to the reader the proof that 


(10) 77 = Q, and if S is commutative, (AB)? = Bra, 


Then we interpret a,x” as the matrix product a7 - x, where x and a each 
consist of one column, so that a’ consists of one row. 
Now if x is transformed into x’ = Tx, and if we are to have 


als = (a7)’x’ = a’7x’ = aT Tx, 


then we must have a? = a’TT, so that we must set a’ = (T-1)7 a; that is, 
a must be transformed by the inverse transposed matrix, corresponding to 
the passage from (2) to (4) on page 263. 

In order that the transformation for a be the same as for x, we must have 
a’ = Ta, so that (T-1)? = T, or T-1 = TI". The equations 


a7T = ITIT=E 


are the same as (O). Thus the conditions of orthogonality and normality 
state that the inverse matrix is the same as the transpose. 


3. Products of Vectors 


3.1. General requirements 


The reader is already acquainted with several different products of 
vectors, We define, with respect to a given basis, 


a) the inner or scalar product by 
xo y = xlyl +--+ xnyn 
b) in a V, the vector product of two vectors by 
EX y= (x?y8 — xby?, xByt — xiy®, xiy? — x*y'), 
c) ina V, the complex product by (cf. [B8, §1) 


on = (xly! — xy’, xby? + x*y); 
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that is, if x = x1 + ix’, y = y! + iy’, then 
xON= xlyl = x?y2 -- i(xty? =o x*y}), 

It is not customary to use any special symbol, like the o here, since this 
product does not usually occur in the same context with the others. 

Only the complex product satisfies all the rules for computation familiar 
from the multiplication of real numbers; but this complex product is 
possible only in a two-dimensional vector space. 

In (a) the product of two vectors is not itself a vector; in (6) it is true 
that for every system of coordinates x x n is defined as a vector, but this 
product is not independent of the coordinate system. The desired independ- 
ence can be attained only if we restrict ourselves to coordinate systems 
that arise from the original system by orthogonal transformations (see 
§2.6 and II, 7), or if we consider the product not as a vector but as a tensor 
of second order with the 9 coordinates x#y — x’y# (yp, v = 1, 2, 3). 
Neither (a) nor (6) satisfies the associative law, and in both cases a product 
can be equal to zero even though neither of the factors is equal to zero. 

We now consider the problem of defining for vectors one or several 
operations that can reasonably be called ‘‘multiplication.”’ What must be 
required of such an operation? 


1) Multiplication will assign to two vectors a “‘product.’”? We cannot 
require that the product belong to the same vector space as the factors, 
but we will require that it belong to some vector space over the same 
domain of scalars S; this requirement means only that for the objects 
which turn up as products of two vectors there is defined, or can be 
defined, an addition and an S-multiplication. 

On the other hand, the “factors’? may come from different vector 
spaces, although they must be over the same domain of scalars. 
(P,) By multiplication we shall mean a procedure which to each vector 
a@eV,(S) and each vector be V,(S) assigns exactly one vector 
c == II(@, b) ¢ W,(S), provided certain further requirements are met. 
Thus, a multiplication is a mapping of the set V x V into W, where 
V x Vis the set of pairs (4, b) with 4 € V,(S), b € V,(S). Included in this 
statement is the consistency of multiplication with equality: 
(P,) From @ = @’ and b = BD’ it follows that IT(@, b) = IT(@’, b’). 

2) If the product does not belong to the same vector space as the factors, 


we can hardly expect a straightforward associative law. Next in importance 
come the distributive laws: 


IT(@, oe Ge ’ b) a IT(@, ’ b) ae IT, ’ b), 


IL, by + by) = I, b;) = I, b,), 
which we shall require from a multiplication. 


(La) 


268 PART B- ARITHMETIC AND ALGEBRA 


3) If in (La) we set 4, = @, and b, = b,, we obtain 


11 (24, b) = 211 (4, b), 
11(4, 62) = IT(4, b) 2, 


provided we assume that V is a left vector space, V a right vector space, 
and W both a left and right vector space over S. This assumption is 
satisfied if, for example, S is commutative and sa = as is defined in all 
three vector spaces; the assumption with respect to W is also satisfied for 
noncommutative S if r= 1 and W=S, which is sufficient for our 
present purposes. Thus we demand from a “multiplication” that the two 
rules written above for (2) shall hold for arbitrary s in S: 


II(sa, b) = sII(@, b), 


(Lm) 
T1(a, bs) = T1(4, b) s. 


Then by (P), (La), (Lm) the multiplication is a mapping of V x V 
into W which is linear with respect to each of the two factors; such 
a mapping is called bilinear, or in the case of several factors multilinear, 
and if W = Sis the domain of scalars, it is also called a bilinear form or a 
multilinear form. 


For later use let us note that if 7 = V, then to the bilinear form IT we can 
assign a quadratic form, namely the mapping a — IT(a, a). 


Thus we have set up the requirements that must be satisfied by an 
operation if it is to be called a ‘‘multiplication.”’ But how are we to give a 
concrete definition of such a multiplication? We must state some rule for 
assigning a product to every pair of vectors. Now, a given vector can, 
on the one hand, be defined geometrically, and in this case we must give 
a geometric definition of multiplication. This problem will be dealt with 
in II,7. 

On the other hand, a vector can also be defined by its coordinates, after 
choice of a basis. Then the rule for multiplication will determine the 
coordinates of the product from those of the factors and we must insure 
that the result is independent of the special choice of basis. 


If we let (@,), « = 1,...,k be a basis of V, and (e,), A = 1, ..., a basis 
of V,, it follows from (La) and (Lm), exactly as for linear mappings, that 


IT(4@, b) = (até, , e,b*) = at IT, , e,) b*. 


Thus, a multiplication is completely defined by the products of the basis 
vectors. For abbreviation we write 


T(é, ’ e,) = IT. : 
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These products belong to W, but this fact is not very helpful, at least not 
yet, because up to now we have said nothing about the space W. In fact, 
it will be necessary, at least to some extent, to construct W. But to do this 
in a suitable way we must first investigate the behavior of products under 
a transformation of basis. 

For a given transformation 


(1) é= he, ¢, = ct 
in which the coordinates a“, b* become 
(2) ae = atte bY" = 0", 


it will be necessary to define a transformation of the IT,, into IT,’,, insuch 
a way that the product remains invariant; that is, for all vectors (a*), (b*) 
we must have 


6) arT,,b° = ae Tb". 
From (2) and (3) it follows that 
(T) IL, = é&, Tt. 


In order for the product to remain invariant under transformation of 
basis, we must assign to the transformations (1) of V and V a transforma- 
tion of the form (T) in W. In the following two sections we shall describe 
two possibilities, and in each case the existence of a “multiplication” 
with the required properties will be proved by our giving an explicit 
statement (in coordinates) of what the products are. 


3.2. The Inner or Scalar Product 


Let V = V, and W = S, so that r = 1. If we write g,, in place of IT,, , 
then (T) states that the g,, must be the coordinates of a covariant tensor 
of second order. Thus to define this multiplication we must first choose a 
basis (e,) in V and then choose arbitrary numbers g,, . With respect to this 
basis the inner or scalar product is defined by 


(1) ab = a*g,.,b%. 


If we make a transformation of basis, then instead of the g,, we must use 
the numbers 


(2) Syiy' > t* Bat 
in order to form the product; that is, we must have 
ab = avg yb". 


It is obvious that the requirements (P), (La), (Lm) are satisfied by the 
product defined in (1). 
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If in a given vector space we have defined a covariant tensor of second 
order, namely a bilinear form, as the “fundamental tensor’ or “‘funda- 
mental form” and have thereby defined an inner product, we say that the 
vector space has a metric structure, or that it is a metric space, a name that 
is explained by the fact that the inner product can be used for the intro- 
duction of a metric. For if S is the field of real numbers and if the quadratic 
form x*g,,x* is positive definite, then the mapping d, defined by 


d(a, b) = V(b" — a*) g,,(b* — a’) = V(b — ajo — a), 


satisfies the requirements for a distance function. 

Let us now suppose that for a given basis an inner product has been 
defined by the coordinates g,, of the fundamental tensor. We ask whether 
it is possible to choose a new basis in such a way that the representation 
of this product will become especially simple; for example, the matrix 
(g,’,’) resulting from the transformation (2) will be a diagonal matrix, 
or if possible the unit matrix. 

In matrix notation, (2) becomes 


(2’) 6! = ITO. 


Two matrices ©, ©’ related to each other in this way are said to be 
congruent. The difference between similarity (§2.5 (1)) and congruence 
consists in the fact that for similarity the matrix Qf represents a mixed 
tensor, covariant with respect to one index and contravariant with respect 
to the other, whereas for congruence the matrix represents a doubly 
covariant tensor. 

If G’ is to be a diagonal matrix, then it must at least be symmetric; i.e., 
G'T = G’. If S is commutative, as we shall assume from now on, then 


GT = IG I — f.§2.6 (10). 


By multiplication with (I7)-1 and T-1 we see that G’7 = G’ if and only if 
G7 = 6; in other words, the symmetry of a matrix remains unchanged 
by transformation to a congruent matrix. 

We now assume that © is symmetric. Then the transformation to a 
diagonal matrix can be effected, exactly as in §2.5, by multiplication with 
matrices U, B; for by right-multiplication with U and &, or by left- 
multiplication with U7 and B", we perform exactly the same operations 
on the rows as on the columns, as the reader may easily verify. 

If S is the field of complex numbers, it is appropriate to introduce 
another concept: for the matrix © = (g,,) we define the conjugate 
transposed matrix ©* = (g%) by 


gi=8.; 6* = 6T = GI, 
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where the bars denote complex conjugates. From the one-column matrix x 
we obtain the one-row matrix x* with the elements x** = x’. 
A matrix with the property 


&* = 6, i.€., Ba = Zr 


is called a Hermitian matrix. The diagonal elements of a Hermitian matrix 
are real ( 8.4. = Bex): 

If G is a Hermitian matrix, the mapping (x, n) > x*Gyn = x*g,,y" is 
called a Hermitian bilinear form, and the mapping x > x*@Gx is a Hermitian 
form. 

Under the coordinate transformation x = Tx’, the matrix © becomes 
©’ = T*GI, so that a Hermitian matrix goes into a Hermitian matrix 
(6'* = G’ if and only if 6* = G). Thus the property of being Hermitian 
is independent of the choice of basis. 

If in a Hermitian bilinear form we take the basis vectors as arguments, 
we obtain, in view of e; = (6,’), 


* 
e; Ge, = Bix, 


or in other words exactly the coefficients of the bilinear form. This result 
holds for any basis. 

The values of a Hermitian form are real; for if ¥7™6x = w, then 
w = x™Gz. Now @ is a number, i.e., a one-row matrix, so that 
® = wT = x7G"x, and therefore, since G67 = G, we have w = w. 

This statement does not hold for quadratic forms with complex argument, 


a fact which explains why Hermitian forms are the appropriate ones in our 
present discussion. 


We now define an inner product by 
(3) II (a, 6) = a*Gb = a’g,,b’. 


In order that this definition may be independent of the choice of basis, 
the matrix © must be transformed according to the rule 


(4) G = IGT or g. = gt. 


If we restrict S to the field of real numbers, we obtain our earlier 
result: a real Hermitian matrix is a symmetric matrix, and a real 
Hermitian form is a quadratic form, so that (3), (4) become (1), (2). 

A Hermitian matrix is taken into a Hermitian matrix by the transfor- 
mation (4), and can be transformed, in the same way as a real symmetric 
matrix, into a (real) diagonal matrix. We omit the proof here, since the 
next section proves a sharper result. 
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Moreover, by transformations of the form 
(5) D' = S*DS 


with regular matrix G, the diagonal elements of a real diagonal matrix can 
be transformed into +1; for example: 


poke, d zak <s 
V\d| V\d| |d| 
1 : 1 . 


1 : l 


But there are other possibilities; for instance, we could also make the 
transformation 


xi = x, xx 


It is an important fact that for all such transformations the number of 
positive and negative terms remains the same (Sylvester’s law of inertia). 

The proof of this law is as follows. Transformation (5) does not change 
the rank r of the matrix (in view of this invariance, r is also called the rank 
of the form represented by the matrix), so that the number of nonzero 
diagonal elements remains the same in every diagonal representation. 
So let us assume that the Hermitian form has been brought by two 
transformations (5) into the forms 


(6) xi x1 + vee + xPyP — xPtlyptl _ ... _ yfyf 
2 yly} foe f yryt — petlyatl — — ytyt, 


where the x4 and y’ are related by y” = s,’x’, and we shall suppose that 
p > q. Then the system of equations 


yi = sx =0 i=1,..44 
xi =0 J=pt+l,...,r 


has a nontrivial solution. If we substitute this solution in (6), then, since 
not all x‘ are 0, the left side is > 0, and the right side is < 0. Since a 
corresponding contradiction can be derived from p < q, it follows that 
Pp = q, as desired. 

For real quadratic forms the theorem and the proof are the same with 
restriction to real transformations (s,*). 

The difference between the number of positive and the number of 
negative terms is called the signature of the form. We have shown that the 
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rank and signature of a Hermitian form or of a real quadratic form are 
invariant under the transformations (5). They are also the only such 
invariants, since every such form can be transformed by (5) into a form 
with diagonal matrix and with diagonal terms +1. 

If and only if the signature is equal to the rank, will all the values of the 
Hermitian form x*@®x be nonnegative; and if, in addition, the rank is 
equal to the dimension of V, then the form will assume the value 0 only for 
x = o. Such a form is said to be positive definite. 

Our discussion has now led to the result: if an inner product with respect 
to a given basis is represented by a positive definite Hermitian form, then 
with respect to a suitably chosen basis it can be represented in the form 


x*y _ Y x’ y’, 


i.e., by the unit matrix. 

We now ask which transformations will leave this form of the represen- 
tation unchanged. By (4) it will be those transformations for which 
IT*ET = &, or 


a*r = &; 


such transformations are called unitary. If S is restricted to the field of 
real numbers, we obtain T7T = €, or in other words the orthogonal 
transformations. The corresponding matrices are also called unitary, or 
orthogonal. 

The set of unitary (or if real, orthogonal) transformations is already 
sufficient to transform any Hermitian (or if real, symmetric) matrix into 
a diagonal matrix. We shall prove this statement in §3.7, after we have 
introduced the concept of a determinant. 


3.3. The Tensor Product and the Outer Product 


A second possibility for introducing a multiplication with the properties 
(P), (La), (Lm), (T) consists of regarding II,, as the basis vectors of a 
vector space W,, whose dimension is therefore r = kl. More precisely: 
let W, be a vector space of dimension r = k/ with the basis e,, . Then we 
define the tensor product by 


(1) IT, ’ e,) = nA» 


where it must be remembered that S is assumed to be commutative; moreover, 
we assume that as = sa is defined in V, V and W. 

The mapping defined by (1) is in general not a mapping of V x V onto 
W; for the images of the elements of 7 x V are a*b’e,, , but the elements 
of W are the g*e,, . 
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The product will be invariant under basis transformations (é.) in V 
and (t,’) in V, if in W we make the corresponding transformation 


ae A fe 4A 
(T) Ep’y’ = bExaly’ = tty’ Ea . 


We must still verify that (7) is a basis transformation, i.e., that the 
e,’,’ are linearly independent. But if we assume that 


che = 0, 
then from (7) and from the fact that the e,, are linearly independent it 
follows that 
ict’ = 0 forall x, A 
If (with arbitrary p’, 0’) we multiply by é’,r?’ and add, we obtain 
O = Pte cH’ tA t0" = cro", 
u v2 


By a transformation (7), the coordinates q** of an element of W are 
transformed according to 


If V = V, so that é’ = t“’, comparison with §2.6 shows that the elements 
of W (if V = V) are contravariant tensors of second order with respect 
to V. If V and V are dual to each other, then the elements of W are 
mixed tensors. 

The tensor product is denoted by 


a & b = a*h*(é, © e,). 


We have retained the distinction between the vector spaces V and V in 
order to make it easier to define the tensor product of several factors, 
since we need only use W in place of V or V, although then, of course, the 
vector space to which the products belong is a new one. 

We obtain 


(4 © b) Oe = axbicx((é, © €,) © &,). 


The product of arbitrarily many factors is now defined by induction. 
However, we are in fact chiefly interested in the case in which all the 
factors come from the same vector space V = V, when we may write 


(a © b) Oe = axb*cu((e, © ea) & &,), 
a®(b &e) = arh’cr(e, © (e, © &,)). 
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To begin with, the vectors (e, © e,) ®e, and e, & (e, & e,) are to be 
regarded as basis elements of two distinct vector spaces. But since these 
spaces have the same dimension /*, we can set up an isomorphism between 
them by an arbitrary one-to-one correspondence of their basis elements, 
whereupon we regard the basis elements assigned to each other by 
(e, & e,) Se, > e, & (e, & e,) as being “the same.” But now the tensor 
product is obviously associative, and we can write 


(2) a®b @&e = ath’ce(e, @ e, & e,). 


The tensor product is not the only bilinear mapping of V x Vin W,. 
Like the linear mappings (see above), the bilinear and multilinear mappings 
form a vector space under appropriately defined S-multiplication. We 
shall consider only the following special case: 


If (a,b) >a @&b, and consequently also (a,b) + b & a, are bilinear 
mappings, then so are 
(3) (a,b) >+avb=a®b+b Wa, 
(4) (a,b)>+anb=a®b—bWa;} 
so that these two mappings may be regarded as products. Since 


avb=bva, 


the first of these products could be called the symmetric product. More 
important is the second one, which because of 


anab=—baa 


is called the alternating product or, by Grassmann, the outer product. In 
the present section we consider only this product. 

As an element of W the alternating product is a contravariant tensor of 
second order. After choice of a basis (e,.) for V and a corresponding basis 
for W, the coordinates of this product are to be obtained from (4). If 
we set 


arb = p(e, & ey), 
we obtain from (4) 
(5) pe = ab’ — a’b*. 


From (5) follows p** = —p*, and in particular p*« = 0. Thus the k? 
coordinates p** are completely determined by the values of a suitably 


3 In the present context the symbols 4, v are defined by (3) and (4), and do not 
mean ‘“tand” and “or.” 
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chosen set of k(k — 1)/2 of these coordinates. In this sense we can say 
that a ~ b has only k(k — 1)/2 essentially distinct coordinates. 
Furthermore, we can show that every skew-symmetric tensor 
p = (p”) with p* = —p*’, has this property; i.e., for the representation 
of such a tensor it is sufficient to know k(k — 1)/2 basis vectors, namely 
QAQ=&Oe—e Se, (« <A). 
For if in 


p= Y pr(e, ® e,) i Y pre, 9) @,) 
A<k 


eA 


we make a change of summation indices in the second sum, we obtain 


p= ¥ pre, @e) + Y pe, © &); 


K<A ed 
from which, since p*“ = —p*, it follows that 
p=) pre, @e — C1 Oe.) = VY pe, A ey). 


K<A 
It is to be noted that for the alternating product we have 
k ok 
arnb=Y ¥ ade, ae) = ¥ ah — arb*)(e, a e,). 
cel Awl K<A 
Another important property is that the alternating product is equal to 0 


if and only if a, b are linearly dependent. 


Proof. (a) If a,b are linearly dependent, so that pa + qb = o and 
say q0, then b = —(p/q)a, and from (4) or (5) it follows that 
aab= ob. 


(b) If a a b = o, then 

(6) ah’ — ab = for all x, A; 

so we wish to find two numbers p and q, not both = 0, such that 
pa + qb = for all «. 


But we may assume a‘ + 0 (since if a = o, then a, b are linearly dependent) 
and therefore, by (6) we may take p = —Db',g = at. 

The alternating product of more than two factors could also be regarded 
as defined by (4). Then we would have 


(7) (anabbac=(a@®b—b Wa) SOe—c¢c&(a &b— b Ga) 
=a®b&We—bWaWe—cWKaWb+ehb Wa. 
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On the other hand, 
arn(bach=a Bb SOe—a&GeSOb—bOeWKat+e hb &a; 


so that the operation A would not be associative. Of course, it would 
satisfy the following law: 


(anabbact+t(bachAat(caayab=0O0. 


This law holds for the vector product in a three-dimensional orthogonal 
space (cf. I, 7, §1.9): 


(ax b)xe+(6 xe) xat+(exa)xb=o. 


Thus the definition by (7) is suitable if we wish to interpret the alternating 
product as a vector product. But this interpretation is restricted to the 
three-dimensional orthogonal space, and we also wish to preserve the 
associative law. With this in mind, we argue as follows: on the right-hand 
side of (7) the three vectors a, b, ¢ are not all on the same footing, since 
only four of the six permutations of a, b, ¢ actually appear. Thus we simply 
write down ai/ the permutations, with a plus sign for the even permutations 
and a minus sign for the odd ones, and replace (7) by the new definition: 


anabac=(aanab)ac=aa(bac) 


=aWbW&e+bWeWOat+eWaW&b 
—a®ce@Ob—c@ObWa-—bWaWe. 


More generally: let 71,..., 7 be a permutation of the numbers 1, ..., 7, and 
let (—1)” = +1 according as 7 is an even or an odd permutation (cf. 
IB2, §15.3.2), then we define 


(8) AAA AT AA, = y(-py An, & Ap &) °° & Aan. 


This sum is to be taken over all permutations of the numbers 1, ..., 7. 

In the case n = 2 we again obtain the definition (4), but (4) must be 
restricted to the case that the two vectors a, b belong to the vector space 
V; then and only then do we have the freedom to define 


(anbhac=aa(bac)=aanbace. 


The alternating product is also denoted by square brackets: 
Ay A Ag A “tA Ay = [01, Ag, ..., Ay]. 

If we interpret a A b as a vector product in a three-dimensional 
orthogonal space, the expression a a b A ¢ now corresponds to (a x b) c. 
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3.4. The determinant 


On the basis of the definition §3.3 (7) the alternating product has the 
following properties: 


(La) [stag Op see) = [erg eel AP [avg A gee 


(The vectors represented here by the dots remain unchanged in each of 
the successive steps of the summation.) 


(Lm) BR A ae [ee « ee Or 
(a) [...a;...a,;...] = 0, if a; =a; and if /j. 


From (a) and (La) it follows that 
(a’) [apc aye] = —E ay ay od, 
since 
[ay ay ee] + Ee ay ay] = [ay + ay ay Fay] = 0. 


If the characteristic of S (cf. IBS, §1.11) is not equal to 2, so that 1 + 1 40, 
then (a) also follows from (a’). 


From these properties it follows that if a,, ...,a, are linearly dependent, 
then [a, ... a,] = 0. For then one of the vectors, say a, , is a linear combi- 
nation of the others, and we have 


[ay ... Gp_y , aye? + ++ + ay_yc*] 
= [ay *' Qp_y, a] CP + 7 + [Oy ... Any, Apa] 0? 7 
=o by (a). 


We shall see later that [a, ...a,] = 0 only if the a,,...,a, are linearly 
dependent, a fact closely related to the solvability of the system of 
homogeneous equations 


ax’ = 0 (« = 1,..,4,v = 1,..., 7). 


It is an important fact that the alternating product is determined in an 
essentially unique way by the properties (La), (Lm), (a). We shall prove 
this statement here only for the case k = n, by actually computing the 
coordinates of [a,, ...,a,] on the basis of these properties alone. 

Let e,,...,e, be a basis of V, , although it would be sufficient that all 
a, are expressible as linear combinations 


a, = tay"; 


in other words the e, need not be linearly independent. 
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Then by (La), (Lm) we have 


[a,...a) =) Yo} [es ees @ Daye o> ain, 
Yn 


is 


From (a) it follows that the product [e, ...e,] = 0 if two indices are 
equal. Thus for the v,, ..., vy, we need to consider only the permutations 
ml, ..., mn of the numbers 1, ..., 2: 


[a, 97 a889 an] = y [en a 2879 ernl ajay” ie a". 
T 


By (a) we have [e,, , ..., Cen] = (—1)* [e, , ..., en], So that 


(D) [ay ,.., Gn] = [er, «5 en) 8 (—1)” ay? + ag”. 


Thus [a, , ..., @,] is determined up to the “factor” [e,, ..., e,]. This factor 
is a vector in W, (r = n”). 

To a great extent, the present discussion can be made independent of 
the preceding section. Given a system of vectors a,, ..., a, in a space V;, , 
Jet us set ourselves the problem of assigning to it a vector [a,...a,], in 
another space W, which is to be equal to zero if and only if the a,,..., a, 
are linearly dependent. Such a mapping must in any case satisfy condition 
(a). Furthermore, it is reasonable, though unnecessary, to demand (La) 
and (Lm). For if, for example, both a; and aj are linearly dependent on 
(ly, ..,%,, then a, + a] ,Q,,...,a, are linearly dependent. But then 
fa, , Gg, ..-, A] = [ay ,a,,...,a,] = 0 implies [a; + aj,a,.,...,a,] = 9, 
which is exactly the case if (La) holds. 

Furthermore, if a, ...a, are linearly dependent, so are a,s, a,,..., 4, , 
and then [a,,...,a,] = o implies [a,s, a,,...,a,] = 0, which is exactly 
the case if (Lm) holds. Thus (La) and (Lm) are not proved (that would be 
impossible) but they are motivated. 

From the definition of a mapping we have the requirement: 


(Lv) From a; = a; follows [..., a; ,...] = [..., af, ...]. 

Now by the argument of the present section alone we see that if a mapping 
with the desired properties exists at all, then for k = n it can only be the 
mapping represented by (D). The numerical factor will be denoted by 


(1) Y (1) aft + ann = 1@,| = [ax |= |) =A 


and will be called the determinant of the matrix Y. 
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If we wish, we may normalize the mapping (D) by choosing a basis (e,) 
such that 


(n) fey, 5 Qn] = 1. 


In this sense we also speak of the determinant of the vector system a, , ..., a, 
with respect to the given basis. The determinant is then a number, but 
under a change of basis it is transformed like the system of coordinates of 
an n-order tensor whose coordinates either all vanish or are all alike except 
for sign. Because of the alternation, we may simplify the transformation 
as follows: 

Let a, = e,a,* = eva’ with 


— K K — fk A)’ 
(2) e, =e,  sothat a* = ta”. 


« 
Then 

[ay , +) Gn] = [e1, --, en] * | ,* | 
= [e), 5 an @..) “1 ae |. 


But [ey , ..., €n’] = [e1, .--5 en] «| t% |, so that 
(3) lax|= |r| [ar |; 


in other words, under a transformation of basis the given determinant is 
multiplied by the determinant of the matrix of the transformation. 

From (3) we can draw an important conclusion: (2) states (among other 
things) that the matrix 2 = (a,*) is the product of the matrices T = (t;.) 
and Q&' = (a’.), and therefore (3) means that 


(4) PEW |= [Tl |W i. 


This relation is called the Jaw of multiplication of determinants. We seem 
to have proved it here only under the assumption that Z is a regular matrix, 
but in (2) we may in fact consider an arbitrary matrix. In this case the 
e, will not necessarily be linearly independent, but it was pointed out at the 
time that the equation (D), the only equation used here, does not require 
the vectors e, to be linearly independent. Thus (4) is valid for arbitrary 
matrices T, 2’. 

Our argument shows that if there exists a mapping which is multilinear 
(Lv, a, m) and alternating (a) and has the normalization (”), then it can 
only be represented by the determinant (1). But does A actually have the 
property of vanishing if and only if a, , ..., a, are linearly dependent? 

At the beginning of the present section we proved: 


If a,,..., a, are linearly dependent, then [a, , ..., a,] = 0. 
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On the other hand, if the a, , ..., a, are linearly independent, they can be 
chosen as a basis, and the original basis vectors can be represented in the 
form e, = a,e,”. But then by (D) we have 


P=) [ti is Cal =" Layo a,] |e,” |, 
so that 


{a,,..., a,] 4 0. 


So the vanishing or nonvanishing of the determinant provides a 
criterion for the linear dependence of a given system of n vectors in ae 
in other words, it determines whether a system of n homogeneous linear 
equations in n unknowns has a nontrivial solution or only the solution 
x} = +++ = x" = 0. Thus the most important step in the theory of systems 
of linear equations has been taken. In order to provide a complete answer 
to the question whether an arbitrary system of k linear equations in n 
unknowns has a solution and, if so, how many solutions, we need only 
introduce certain refinements, to be described in the next two sections. 
For this purpose we require from the present section only the definition 
of the determinant of a matrix given by (1) above. The preceding discussion 
has motivated this definition, but if we are willing to adopt it without 
motivation, the theory of systems of linear equations can be developed 
independently of the theory of vector spaces. 


3.5. Rules for Calculation with Determinants 


I. In order to emphasize that we are taking over almost nothing from 
the foregoing discussion, we repeat the definition of a determinant: the 


determinant of the quadratic matrix W& — (a,)et-7" is the number 


(D1) A= (20 = 16,9 = 2 Drag ar. 
2. Interchange of rows and columns. The transpose 27 — (a) of the 

matrix is defined by a = a’. Its determinant is 
Par = Se ah, at. 


In each of these summands let us make the permutation 7~! in the factors. 
Then 


1 


| M| =P (H1)" aft a, 


But since 77! is even or odd together with 7, and since z~! also runs 
through all possible permutations, we have the result 


(D2) | M7 | = | Wy. 
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Consequently, any rule for calculation that concerns the columns of a 
matrix is also valid for the rows, and conversely. 


3. Expansion of a determinant by the elements of a column (or of a row). 
If in (D1) we combine all the summands containing a,! and then all those 
containing a,?, and so forth, we obtain the determinant in the form 


A= aA;} + a,"Aa + eee + a Ad: 
Here, for example, 


A} — YY (—1)” az? ae an, 
aw’ 


where 7’ runs through all permutations of the numbers 2, ..., 7. But this 
expression is exactly the determinant of the matrix obtained from W by 
deleting the first row and the first column. We speak here of an (n — 1)- 
rowed subdeterminant of %& or of A, and we shall later use the expression 
“‘r-rowed subdeterminant” in the corresponding sense for rectangular 
matrices. If we denote by U. the subdeterminant obtained from A by 
striking out the «th row and the vth column, we have: A,” = (—1)’** UY. 
We leave it to the reader to verify this rule, and the following equations, 
by actual calculation: 


n 
A=) aA,‘ (for every i; no summation over i) 


col 


n 
= ) ajA/ (for every j; no summation over /). 

v=] 
By forming the sum >, a;*A,3, we obtain the determinant of the matrix 
formed from %& by deleting the jth column and replacing it by the ith 
column. But this matrix contains two equal columns, so that its 
determinant is 0, a fact which we can either take from §3.4 (a) or derive 
directly from (1). We thus obtain, together with the above equations, 


(D3) ax ie = 6, - A; a,jiA, = 8, >A, 
Here again we take summation over equal indices. 


4. The Laplace expansion. The result (D3) can be generalized in the 
following way: instead of the 1-rowed subdeterminant (a;*) consisting of 
the elements of a fixed column, we can consider the g-rowed subdeter- 
minants formed from q fixed columns. Let these columns be numbered 
i, , «5 ig, let the rows of such a subdeterminant be numbered «,, ..., Kz, 
and denote the subdeterminant itself by 
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By the algebraic complement of this subdeterminant we mean the subdeter- 
minant (with suitable sign) formed from A by deleting the columns 
i, , ..., ig and the rows x,,..., «,. If we denote its columns by i,,,, ..., i, 
and its rows by kgi1,---)_, then i, ..., 095 dgar, 5d, and K,,..., Kg, 
Kot» «+» Kn are permutations of 1, ..., 2, and the desired algebraic comple- 
ment is given by 


The generalization of (D3) is now given by the Laplace expansion: if we 
choose q arbitrary columns (or rows) and hold them fixed, and then 
multiply every g-rowed subdeterminant that can be formed from them by 
its algebraic complement and sum up, we obtain the determinant A: 


(D4) Y as ieee fe : Ais wae ia — Y (—1)(—1)* ay Vedes fa : ator See ar bi 


(x) (x) 7 


For a fixed permutation i the summation here is to be taken over all 
possible choices of g numbers «,, ..., «, from among the numbers 1, ..., 7. 

For the proof we may either write out the subdeterminants in full and 
verify that exactly the same products a7! --- a” occur as in A, or we may 
verify (La), (Lm), (a), (2) and make use of the fact that the determinant is 
the only function with these properties. 


5. The inverse matrix. From (D3) it follows, if A 4 0, that the matrix 
W = (A,’/A) satisfies the equations 


UY = Ye = E, 


and is thus both a right and a left inverse. 
If A = 0, then has no inverse, since it follows from AW = E by 
the theorem for multiplication of determinants that 


(U) |W) =1, so that | W| +0. 
The multiplication theorem for determinants can also be proved by direct 


calculation without use of the earlier theory. 


3.6. Applications of Determinants to Systems of Linear Equations 
1. If in 


(1) Saye’ = be (ex =1...k) 


pol 


we have k = n, then for each A = 1,..., we multiply the «th equation 
by A,/ (see §3.5.3) and add: 


A,3a,kx” = A,b*. 
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From (D3) it follows that 
(1) Ax* = A,)b*. 


Every solution of (I) is a solution of (1). If A # 0, then (1) can have only 
the solution 


(2) x1 = A,Ab*/A (A =1...n). 


By actual substitution it is easy to see that this system (x’) is actually a 
solution of (I). Thus (I) has the unique solution (2). The same remark holds 
for the case with all b« = 0, when the unique solution is the so-called 
trivial solution: all x? = 0. 


2. If A = 0, and also if kn, let us find the “largest possible” 
subdeterminant + 0 that can be formed by striking out rows and columns. 
Let there exist an r-rowed subdeterminant + 0, whereas all (r + 1)-rowed 
subdeterminants = 0. From §3.5 (D4) it follows that all subdeterminants 
with more than r + 1 rows are equal to zero. 


By renumbering, if necessary, the equations and the x’s, we may assume 
that 


a," eee a,” 


ryl ses fy? — _yr rt+l 8 wee ryn r 
a,’x! + + a,"x? = —at_ x a,"x" + b 


for the x?,...,x7, where the solution will contain x"+},...,x" as 
“parameters” whose values may be chosen arbitrarily. We can then show 
that these solutions, for arbitrary values of x’+, ..., x", are also solutions 
of the remaining equations of the original system, provided this system 
has any solutions at all. The details are to be found in any textbook. 

From this result it follows that the rank of a matrix (cf. §2.3) is the 
number r characterized by the following property: there exists at least 
one r-rowed subdeterminant ~ 0, but all (r + 1)-rowed subdeterminants 
are = 0. 


3.7. Unitary Transformations of Hermitian Forms 


We now come to the proof of the assertion at the end of §3.2. 
In the present context S is the field of complex numbers, a basis (e,) 
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is given in V, , and only unitary transformations are admissible. For the 
given basis, and consequently for all admissible bases, i.e., all bases 
arising from unitary transformations, let there be an inner product 
defined by 


x*y (in the real case: x71). 


We call x a unit vector if x*x = 1, and we say that the vectors x, » are 
orthogonal if x*y = 0. This definition satisfies all the customary require- 
ments for orthogonality (cf. II, 7, §2.5): namely, (1) 0 is orthogonal to 
every vector; (2) if a is orthogonal to b, then b is orthogonal to a; (3) the 
vectors orthogonal to a given vector a form an (n — 1)-dimensional vector 
space. With this definition the given basis and all other admissible bases 
consist of orthogonal unit vectors. 

We wish to show that a Hermitian form represented for the given basis 
by x*Qx (ie., by the Hermitian matrix 2) can be reduced by unitary 
transformations to the form n*Dy with diagonal matrix D; in other words, 
there exists a unitary matrix T such that 


(1) VUI =D 


is a diagonal matrix. 

In the real field this matrix produces the orthogonal transformation of a 
quadratic form into a sum of squares; but the proofs in the complex field 
are exactly the same, so that here we discuss the more general case. 

By §1.5 the columns of T are the coordinates of the new basis vectors 
with respect to the old basis. So these coordinates are precisely what we 
are looking for; i.e., we seek a suitable system of orthogonal unit vectors. 

Since T* = T-}, it follows from (1) that WI = TD. In this matrix 
equation let us consider the ith column. If e, is the ith column of T, and 
d, is the ith diagonal element of D, we obtain 


We, = de,’ (no summation over i). 


Thus the desired vectors of the new basis are the solutions of the equation 
(2) Wo = dv. 


This analysis of the problem shows that we must find n mutually orthog- 
onal vectors which are solutions of (2), where d may have various values 
still to be suitably determined. 

If v is a solution of (2), then so is sv, so that for each solution we may 
arrange that p*p = 1. 

It is easy to verify that the matrix T formed from 7 such vectors provides 
the desired transformation of W. 
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The trivial solution of (2) is not a solution of our problem. A nontrivial 
solution exists only if 


(Ch) (UW —dEj=l-- se eee — 0. 


This equation is called the characteristic equation of the matrix Q, and its 
solutions are the eigenvalues of QW. 

In some contexts it is customary to define the eigenvalues as the solutions 
of |d& —€|=0. 

To each eigenvalue there correspond nontrivial solutions of (2), which 


are Called eigenvectors of the matrix 2%. The equation (Ch) is of the nth 
degree in d: 


(3) Ay — Ad + + (—1)™1 A, 4d"! + (—1)" d" = 0. 


The coefficients are sums of principal subdeterminants, formed by striking 
out rows and columns with the same indices. In particular, 


A,=A=|A] 


n 
An = >¥ ay = tr U, the trace of U. 
t=] 
The characteristic equation has the following invariant property. If T is an 
arbitrary regular matrix, then 


{WM —d€} =| I] |W —dej JT] =| I-A —dwtq| 
= | IYI — dE |. 


Consequently, similar matrices have the same characteristic equation and 
thus also the same eigenvalues. Since the A, are the elementary symmetric 
functions of the eigenvalues, they also, and in particular the determinant 
and the trace, are invariant under similarity transformations (in the sense 
of equation (1) in §2.5). 

For the problem of finding m mutually orthogonal eigenvectors of the 
Hermitian matrix Q the following two theorems are fundamental: 


1) The eigenvalues of a Hermitian matrix, and thus of a real symmetric 
matrix, are real. 


Proof: Let d be a complex eigenvalue and v a corresponding 
eigenvector, so that %v = dv and thus also Ys = dv. Multiplying by 
D? and v7 respectively, we obtain 
(4) p7Av = dd'v, 

(5) vT Wp = dv. 
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In (5) we form the transposes: 


(6) BTU = ddTv. 
Since UT = A, it follows from (4) and (6) that 

(d— d)57 = 0. 
But 


By = ) ov ~0, sothat d=d, 
v=] 


as desired. 


2) Eigenvectors belonging to distinct eigenvalues are mutually orthogonal. 
Proof: let Wo, = dv, , and Wv, = dv, . Then 

(7.1) vxUv, = d,vXv 

(7.2) v*WUpv, = d,v*o, . 


1°? 


In (7.2) we form the conjugate transposes. Since d, is real and U* = QI, we 
obtain 
vxUv, = dv*v, . 


Comparison with (7.1) gives 
(d, — d,) vxv, = 0, 


so that if d, ~ d,, then vy‘v, = 0, as desired. 

If the equation (Ch) has n distinct zeros, these theorems show that our 
problem is completely solved. The zeros of (Ch) are themselves the 
elements of the desired diagonal matrix. If we are interested only in this 
matrix or, in other words, in the result of the transformation, the eigen- 
vectors do not need to be calculated at all. 

Let us give a simple example to show what may happen if (Ch) has 
multiple roots. Consider the real quadratic form 


(8) xix? + x*x? + cxdx3, 


The fact that the matrix is already in diagonal form will make the calcula- 
tion shorter. The characteristic equation is 


(9) (1 — d)*(c—d) = 0. 
The system of equations (2) becomes 

(1 — d)x! = 0, 
(10) (1 — d)x? = 0, 


(c —d)x* = 0. 
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For the double root d = 1 of (9) the matrix in (10) has the rank n — 2, so 
that the solutions of (10) form a two-dimensional vector space, in which 
we can find two mutually orthogonal solution vectors (as eigenvectors), 
one of which may be chosen arbitrarily. In our case the eigenvectors 
(normalized by v*v = 1) comprise all the following vectors with arbitrary p: 


1 ae (cos P; sin P, 0), 
D, = (—e sin g, € cos g, 0), e= +1, 
v, = (0,0, 1). 


In geometric language this result means that in an ellipsoid of rotation the 
principal axes are not all uniquely determined. 

In general, we can prove the existence of 7 orthogonal eigenvectors in 
a totally different manner, which depends on the property of a quadric 
that its principal axes are of extremal (more precisely: of stationary) 
length. A quadric is defined by 


x*Ux = I, 


and the length of a vector is measured by x*x, Let us ask the question: 
When does x*x assume a stationary value under the subsidiary condition 
x*QUx = 1 or, what amounts to the same thing, when does x*Ix assume 
a stationary value under the subsidiary condition x*x = 1? Introducing 
the Lagrange multiplier k, we see that the partial derivatives of 


x*Ux — kx*x 


must be set equal to zero. The calculation is slightly different for the real 
and the complex case. In the real field we must form the equations (for the 
meaning of the 4,, see §1.5) 


axi [x*(@,.4 — k8,,) x*] = (a, — k8;,) x* + x(a — k6,2) = 0. 


Since a;, = a,,, it follows that 
(a;, — k8;,) x* = 0 or (2 — k€)zx = 0. 


In the complex case we may consider x‘ and x? as independent variables 
and then construct the equations 


(a) Fo FRG, — 8,1) x4] = de — Kdys) = 0, 


7) 
(5) Ox? [X*(a,.. a k8..) x] = (4: Po k8;,) x" = 
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In matrix notation we have 
(a) x*(W —k€) = 0, (6) (UW —kE)zx = 0. 


Since 21* = YW, these two equations have exactly the same meaning, 
namely (2). The existence of the desired solution-vectors now follows 
from a theorem of analysis. Since the “‘points” for which x*x = 1 form 
a Closed set, and since x*x is continuous, there exists a vector x = 0), 
for which x*Q%x assumes an extreme value with x*x = 1. Next we 
determine a vector v, such that vf2Iv, assumes an extreme value under 
the subsidiary conditions vv, = 1, vv, = 0. We proceed in this way 
until the equations vy‘v; = 0 for all i < k can no longer be satisfied, i.e., 
until we have n such vectors. 

We shall omit the proof that the matrix T = (v,“) constructed in this 
way actually has the desired properties. 

Thus we have reached the desired result that every Hermitian matrix can 
be transformed into a diagonal matrix by a unitary matrix T with 


ld TUT — dD. 


The elements of D are the eigenvalues of the matrix QI; i.e., they are the 
solutions of the equation 


|W — d€| = 0. 


The number of nonzero d, is equal to the rank of D, and thus also to the 
rank of YW. ; 

If in addition to transformations with unitary matrices we allow 
transformations 


WwW = StAS 


with arbitrary regular matrices ©, then, as was shown in §3.2, we can 
reduce the terms of a real diagonal matrix to +1. 


List of Formulas 


I System of linear equations §1.1 §2.4, §3.6 
II System of homogeneous linear equations §1.1, §2.4 
G Equality of n-tuples §1.2 
A Addition of n-tuples §1.2 
Al—A6 Laws of addition §1.2 
S-M Multiplication of an n-tuple with a scalar §1.2 
Mi—M6 Laws of S-multiplication §1.2 
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Properties of linear transformations §2.1 §2.6 §3.1 
§3.4 


Orthogonality and normalization of matrices §2.6 
Postulates for products of vectors §3.1 
Transformation of II,, §3.1 

Alternation (for the outer product) §3.4 
Determinant §3.4 

Normalization of the determinant §3.4 

Rules for calculation with determinants §3.5 
Characteristic equation §3.7 


CHAPTER 4 


Polynomials 


1. Entire Rational Functions 


1.1. Definition and Standard Notation 


By an entire rational function we mean a function (defined, let us say, 
for all real numbers and assuming real values) that can be constructed 
by addition and multiplication alone. Of course, we must make this remark 
more precise, and in doing so we shall free ourselves from any definite 
domain of numbers, considering instead an arbitrary commutative ring 
R with unit! element 1 (see IB1, §2.4). Thus in what follows we may take 
R to be the field of real numbers, or of rational numbers, or also the ring 
of integers, but in each case we must take care to use only the properties 
implied by the definition of a ‘‘commutative ring with unit element.” 
We now consider a function of one argument, defined in R and with 
values in R; in other words, we consider mappings of R into itself. Two of 
these mappings have a particularly simple character; namely, for every 
ce R the constant function x — c (which to every argument x € R assigns 
the value c) and the identical function xx. For every pair fig 
of mappings of the ring R into itself we can form the further mappings 


x—>f(x) + a(x) x—>f(x) g(x). 
We denote these by f + g and fg, so that 
(f+ g(x) =f) + g(x), = ex) = S(*) g(*). 


It is obvious that these mappings again take R into itself. In the set of 
mappings of R into itself we have thus defined an addition and a multi- 


’ For this element we use the symbol 1 from ‘“‘force of habit” without implying 
thereby that the natural numbers are contained in R. 


21 
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plication.? By an entire rational function of one argument in R we now mean 
the constant functions, the identity function, and every mapping of R 
into itself that can be formed from these functions by repeated application 
of addition and multiplication (a finite number of times). Entire rational 
functions of several arguments can be defined in the same way, but we 
shall introduce them in a different manner in §2.3, so that for the present 
we restrict ourselves, without special mention of the fact, to functions of 
one argument. 

As a result of the rules for computation in R, the set of entire rational 
functions can be characterized very simply: 

A mapping f of R into itself is an entire rational function in R if and 


only if there exist elements ay, ...,@,€R such that for all xe R we have 
the equation® 
(1) f(x) = Yi a,x. 

t=0 


To show that every function f defined by (1) (for all x € R) is an entire 
rational function, we introduce the notation ¢ for the constant function 
x—c and I for the identical function x — x. Then (if we set I° = 1), 
it is obvious that x > a,x‘ is the function a,J* and thus from (1) it follows 
that f = >°7_, a,/*; that is, fcan be formed from J and the a, by addition 
and multiplication. In order to show, on the other hand, that every entire 
rational function f can be represented in the form (1), it is only necessary 
to prove that this statement holds for ¢ and / and that it holds for f + g 
and fg if it holds for fand g. For ¢ we need only set ay = c,n = 0 in (1), 
and for I we set a, = 0, a, = 1, n = 1; if, besides the representation (1), 
we also have g(x) = )”, 0,x', then we define a; = 0, b, = 0 for 
i >n,k > mand obtain,’ with / = max (m, n), 


U 


Q) f+ 8) =O) +80) = Y @ + 6), 
@) (e@)=SOe@) =D aabaxt* = Y (Y aabyne) 


which completes the proof. 

If f 4 Q, then in (1) we may obviously assume a, + 0. But are n and the 
a, already uniquely determined by /? Since in (2) we may obviously replace 
+ by — on both sides, we see that the answer to this question is affirmative 


2 It is easy to see that with these operations the mappings form a, commutative ring 
with unit element. 

3 In order that this notation may be applicable to the case x = 0 we must define 
0° = 1, as we shall do throughout the present chapter. 

4 In (3) we must apply IB1, (41) (20). 
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if and only if f = Qin (1) implies a, = --- = a, = 0. In §1.2 we shall see 
that this is certainly the case if R has no divisors of zero and contains 
infinitely many elements. But in the theory of numbers it will often be 
necessary to consider rings with only finitely many elements. One example 
is the ring containing only two elements, namely the zero element 0 and 
the unit element 1, with 1 + 1 = 0 (the residue class ring G/2 in IBS, §3.7), 
which has x? + x = 0 for all x; in other words, the entire rational function 
x— x? + x is the constant function 0, even though it has the representation 
(1) with aj = 0, a, = ag = lln =2. 


1.2. Zeros 


In this subsection we let R be a commutative ring with unit element and 
without divisors of zero. By a zero of the entire rational function fin R 
we mean an element ae R such that f(a) = 0. If not every element of R 
is a zero of f, then f has only finitely many zeros.' More precisely, we prove 
the following theorem: 


If f admits a representation (1) with a, 0, then the number of zeros 
of f is at most n. 
In the first place, we deduce from (1), and from the fact that 


-1 tl -1 
(x — a) Y xkof-t-k =P xbtlgii-k — SY xkgi-k 
k=0 k=0 k=0 
t t-1 
= >» xkqi-k — yy xkgi-k = x? — qt 
k=1 k=0 
for n > 0, the equation 
(4) F(x) — fl) = (« — YA), 
with 
n—1 n ; : 
(3) fio) = Y ax, a= Y ao. 
k=0 i=k+1 


The theorem can now be proved by complete induction on n. For n = 0 
the function f has no zeros, since a) = 0, so that the assertion is true. 
For n> 0, the induction hypothesis can be applied to f,, since 
a,,-1 = a, ; in other words, f, has at most n — 1 zeros. Now if « is a zero 
of f, then (4) gives us the equation f(x) = (x — a) f,(x). For a zerox ~a 
of f we thus have (x — a)f,(x) = 0, and therefore f,(x) = 0, since 
x — a0 and R has no divisors of zero. Thus f has at most® one zero 
more than f, and therefore at most n zeros. 


5 Since by IB1, §1.5, the empty set is to be considered as a finite set, the case where f 
has no zero is included here. 
° It may happen that « is also a zero of f, ; cf. §2.2. 
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From this theorem it follows that (1) holds with a, 4 0 and f(x) = 0 for 
all x € R only if R has at most 7 elements. It follows at once that if R has 
infinitely many elements and if f(x) = 0 for all xe R, then (1) implies 
=" '=a,=0. 


1.3. Horner’s Rule 

From the definition of the a, in (5) it follows at once that a;_, = a, , 
Gy = a,0 + a, (kK = 1,...,n — 1), f(a) = ax + ay. These equations 
lead to the following simple rule, due to Horner, for calculating’ f(a): 


Qn, Qn-1 Qa, Ay 
| ai,_yo.| ne aic| arto. | 
{7 yA A VA 
Qr1 Ang | F(a) 


The vertical arrows denote addition, and the diagonal ones multiplication 
with «. This procedure produces not only f(x) but also the a, and therefore 
J, . If we apply the the same procedure to f, in the case n > 1, we obtain, 
corresponding to (4), (5), the values of f,(«), the a; with f,(x) = S27? aix*, 
and f,(x) = fi(«) + (x — «)f,(x), so that after substituting these values 
in (4) we have 


F(x) = f(a) + A(X — &) + (& — ow) f(x). 


Continuing this way, we obtain recursively f, , the aj (k = 0,...,2 — h) 
for h=1,..., with f(x) = Deak x*, and fi-i(x) = fale) + 
(x — «)f,(x), where we have set fy, =f, a’ =a,. By complete 


induction on h we have 
(6) f(x)= . Fila)(x — a) + (x — a) f(x) for Aga. 


Since f,, is obviously a constant, so that f,(x) = f,(«), we further obtain 
from (6) 


(1) f(x) = ¥ fla)(x — 0 


which converts the representation (1) from x to x — a. By complete 


induction on h it is easy to show that a”, =a, and thus® 


Io) = 0," ae 


7 The rule is particularly convenient for use with a slide rule, since the only multi- 
plications are with the constant factor «. 
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Applied to f(x) = 2x* — 3x3 + x? —1 and «= 1, Horner’s rule 
gives: 


2-3 10 -!1 
2-1 0 —-0 
2-1 0O O|-1 
2 1 | 
2 1 tI1{1 T(x) 
2 ‘| = 2(x — 14 + S5(x— DF 4+ 4(x—1? + (w—-)D-1 
2 3 4 
| 
275 
2 


We can also use the rule to transform the expression (1 + z)" by setting 
S(x) = x", « = 1, and z = x — 1; if, for simplicity, we omit the inter- 
mediate rows (with the a‘), we obtain: 


I 0 0 ‘ ‘ , 0 
1 1 1 ; : : [1 
1 2 3 . -_[n 

1 354% +f 


If we strike out the first row, number the remaining rows and columns from 
0 to n, and denote by c,, the number in the ith row and kth column, then 
by Horner’s rule (as indicated by the two small arrows) we have 


(8) Con = Cio = 1, Cistiega = Crttye + Cir for OSE+ k<n—2 
and 
(9) (1 + z)" = by Cineie’ 

k=0 


The c;, thus recursively determined by (8) are called the binomial 
coefficients. By complete induction on n it is easy to show from (8) that 


Cr = (’ es ), where as usual we have set 
n 
(10) (j)= (1 hay k)/TL # = CC ; 
If we multiply (9) by a” and set az = b, we obtain the binomial theorem 
(11) (a+br=¥ (") aii, 
i= 


® Since the method of comparison of coefficients (see §2.1) is not yet at our disposal, 
we cannot simply deduce this fact from (7). 
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The table for the c,, is usually turned through 45° and called the Pascal 
triangle: 


Since we are working here in an arbitrary commutative ring R with unit 
element 1, the c;; are not to be considered as natural numbers but rather 
as elements 5°”, 1 of R. But then we could have 1 + 1 = 0, for example 
(cf. §1.1), so that the quotients in (10) would cause trouble, which may 
be avoided by regarding the c,, as natural numbers and interpreting an 
expression like mr (m a natural number, re R) in (9) and (11) as 3°", r. 
With this precaution the result (11) holds in any commutative ring with 
unit element.® 


2. Polynomials 


2.1. Formation of a Ring of Polynomials 


In §1.1 we have already seen that in general the function x — >°?..5 a,x? 
does not uniquely determine the a;. But for calculation with such 
expressions as }‘"_, a,x* it would be very convenient to be able to assume 
that the coefficients a; are uniquely determined by the values of the 
expression. This will unquestionably be the case (for an element x with 
certain properties) if in R or in a suitable extension of R we can find an 
element x such that an equation >.,a,;x' = 0 always implies 
a) = "*' =a, =0; for then we can recognize, as in §1.1, that 
7 9 a4,x* = 7, b,x' implies (comparison of coefficients) the equations 
a; = 6; (i = 0,...,”). An element x with this property will be called a 
transcendent over R. If Ris the field of rational numbers, then in agreement 
with the definition in IB6, §8.1, any transcendental number may be chosen 
as a transcendent over R in the present sense. Since a transcendent x 
cannot satisfy any algebraic equation >"_,a,x‘ = 0 with a, #0, it 
cannot be characterized (i.e., determined) by statements involving only 
x, elements of R, and equality, addition, and multiplication in R. Thus 


® Of course, the proof could have been carried out independently of Horner’s rule. 
We can also dispense with the existence of a unit element in R if we agree that in (11) 
a"b°, a°b" are to be interpreted simply as a’, b”. 
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the transcendents are also called indeterminates.!° But a name of this sort 
must not be allowed to conceal the fact that a transcendent must be a 
definite element (of an extension ring of R) and that the existence of such 
elements must in every case be proved. As an indeterminate over the field 
of rational numbers we may, as remarked above, choose any transcen- 
dental number (such as e or =). 

Thus it becomes our task to extend the commutative ring R with unit 
element | to a commutative ring R’ containing a transcendent x over R. 
By saying that R’ arises from extension of R and may thus be called an 
extension ring we mean that the elements of R are all contained in R’, 
and that addition and multiplication of these elements leads to the same 
result in R’ as in R; we express the same idea by saying that R is a subring 
of R’. Throughout the present chapter we shall make the tacit assumption 
that the unit element of R is also the unit element of R’, and we shall also 
assume that all the rings in question are commutative. 

Such a ring R’ certainly contains all the expressions 5°", a,x‘. But by 
the definition of a transcendent these expressions are in one-to-one 
correspondence with the sequences (4,)n-91,..., if from the sequence 
Ay , «++» Gm We construct an infinite sequence by setting a, = 0 for n > m. 
So let us see what will happen if for R’ we simply take the set of sequences!” 
a = (@,)n-0,1,,.. With the property that there exists a natural number m, 
such that a, = 0 for n > m. Motivated by (2) and (3), we now define 
addition and multiplication in R’ by 


n 


(12) (a + b)n = an + bn, (ab)n = Diba, 

7=0 
from which it is easy to see that the sequences a+ b, ab are again 
contained in R’. With respect to this addition the set R’ is obviously a 
module, and we see at once that the multiplication is commutative and 
distributive. Finally, associativity is shown thus: 


(ab) e0) = FY aPriter = 3 a oe 


k=0 7 i=0 k= 
n n-t 

= y a; brCn-i-n = (a(be)), - 
i=0  h=0 


1° In §§2 and 3 the symbol x will almost always denote an indeterminate; more 
precisely, x is a variable for which only indeterminates can be substituted. On the other 
hand, in §1 the variable x (provided it is not bound) may be replaced by any of the 
elements of a ring. 

1 If R’ or Risa field, we speak of an extension field or subfield, respectively. 

1! For the notation see IB1, §4.4. Instead of simply writing a we shall sometimes use 
the more complete symbol (ap, ..., a, , 0, ...). 
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Consequently, R’ is a commutative ring. Also, it is easy to show that 
a— (a, 0, ...) is an isomorphism of the ring R into the ring R’. Thus we 
may extend the equality (cf. IB1, §4.4), which up to now has been defined 
only between elements of R and elements of R’, by setting 


(13) a = (a, 0, ...) and (a,0,...)=a for aeER. 


Then R is a subring of R’, and the unit element | of R is also the unit 
element of R’; moreover, the zero element of R is also the zero element 
of R’, as follows at once from (12), (13). 

Then R’ certainly contains a transcendent over R, namely z = (0, 1, 0, ...). 
To prove this we derive the equation 


(14) (iy, +5 An, 0, ...) = Y a,z" 

i= 
by complete induction on n. For n = 0 the equation follows from (13). 
From (14) for an integer n > 0 we have 


n 


(ay areey Ani » 0, =) = > a,z' ar (bp 2089 Dn+1 ) 0, a); 


t=0 


if b; = 0 for i=0,...,n and b,,, = a,,,. For the case aj = °°: 
An = 0, Qn = An4,, we further obtain from (14) 


Qn 412” a= (co area ns 0, en) 


with c,; = 0 for i= 0,...,2 — 1 and c, = a,,,. From the definition of 
b,; , ¢; , z it now follows at once from (12) that 


(i 2 = (Ons 1 Oise); 
and thus 


n n+1 
(Gi 5 «+9 Onay » 0,6.) = Do ay24 + ayyz"tt = Y a,z’, 
i=0 i=0 
which completes the proof by induction. If we now have 5°; a,z' = 0, 
it follows from (14) that (a), ..., a, , 0, ...) = 0; but since (13) are the 
only equations holding between an element of R and an element of R’, we 
thus have a, = °** = a, = 0, so that z is a transcendent. As a generaliza- 
tion of the concepts in 1B3, §1.3, we may now state the content of (14) 
in the following way: the x‘ (i = 0, 1, ...) form a basis of R’; that is, R’ 
is a vector space (of infinite dimension) over a domain of scalars that is 
not necessarily a skew field but only a commutative? ring. 


13 The commutativity is required only for the multiplication. Compare the multiplica- 
tion in a vector space R’ with the multiplication in an algebra of finite order (IBS, §3.9). 
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Any commutative ring which, like the ring R’ just constructed, contains 
the ring R and also a transcendent x with respect to R, and also consists 
only of elements of the form }"?_, a,x‘ (a; € R), is called a polynomial ring 
in the indeterminate (or also in the generator) x over R and its elements are 
called polynomials in x over R. The above discussion shows that a 
polynomial y = >, a,x‘ determines the sequence (a), ..., a, , 0, ...) 
uniquely. The terms of this sequence are called the coefficients of y (more 
precisely: a; = coefficient of x‘ in y). For y€ 0 it is obvious that there exists 
exactly one greatest integer m > 0 with y = )%, a,x‘, a; « Rand a, ~ 0. 
This number is called the degree of y, and a, is the leading coefficient. To 
the polynomial 0 we shall assign the degree 0, although this is not usually 
done. Then the set of polynomials of degree < n (or also < n in case 
n> 0) is a module with respect to addition, as is easily shown. In 
particular, the set of polynomials of degree 0 is equal to R itself. 

The ring R’ thus defined is not the only polynomial ring in an indeter- 
minate x over R, although it is the easiest to construct; but every such 
polynomial ring is mapped isomorphically onto R’ by the correspondence 


n 
by a;x" — (a tery An , 0, =) 
i=0 


where the elements of R remain fixed and x is mapped onto 
z = (0, 1, 0,...). The calculations given above in (2), (3) remain valid here, 
so that with 1 = max(n, m), a; = 0 = b, fori >n, k > m we have 


5 a,x’ + 3 b,x' = y (a; + b,) x! 
i=0 i=0 ix0 
(15) 
n m n+m t 
(Xa) 2 bat) = 2 (> aba) = 


Moreover, it is easy to show that this is the only isomorphism with the 
desired properties. The various methods of construction are therefore 
completely equivalent to one another, so that we may speak of the 
polynomial ring in x over R. To denote this ring we shall use the symbol 
R[x]. 

If S is an arbitrary extension ring of R (the simplest case would be 


S = R), then every polynomial }°?_, a,;x* € R[x] defines an entire rational 
function 


(16) u> > aut 


i=0 
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in S; for it follows from }'?.5 a,x‘ = Sy b,x? that (ay, ..., a, , 0, ...) = 
(by, «+5 Om, 0,...) and thus }?,a,u' = >", 5,u', so that in fact the 
mapping (16) is uniquely defined by the polynomial alone. Moreover, the 
definition of addition and multiplication of polynomials is such that for 
every ue S the mapping >", a,x*—> >, a,u’ is a homomorphism of 
the ring R[x] into the ring S, as is easily shown. The homomorphism 
determined by wu in this way is often referred to as substitution of u (for the 
indeterminate x). In particular, if S is an extension ring of R[x] and if the 
function (16) is denoted by f, we have )°7_, a,x’ = f(x): the polynomial 
is the value, for the argument x, of the function corresponding to it, and 
therefore it is uniquely determined by f In this way we obtain a one-to-one 
correspondence between the polynomials and all functions defined’ in 
an extension ring of R[x] (which may be R[x] itself) by the a),...,a,¢R 
as in (16). Consequently, polynomials in x will be written below in the 
form f(x), where f is the function so defined. Then the results of §1.2, §1.3 
also hold for polynomials. The value f(u) of the function f (i.e., of the 
function corresponding to the polynomial f(x) for an argument wu) in an 
extension ring of R[x] will be called, concisely though inexactly, the value 
of the polynomial f(x) at the point u. But this abbreviated way of speaking 
must not be allowed to conceal the fact that a polynomial over R is not 
necessarily a function defined in R (or in an extension ring of R). In any 
case, the above polynomials (a), ..., a, ,0,...) by means of which we 
demonstrated the existence of polynomial rings, are not functions of this 
sort. Of course, it is possible that polynomials over R may also be functions 
in R. For example, in an infinite ring R without divisors of zero we may, 
by §1.1, §1.2, define the polynomial ring over R as the ring of entire 
rational functions in R with the identity function J as generator, provided 
we set the constant function ¢ equal to c. But in many cases (though not 
in the case just mentioned) we must even then distinguish between the 
polynomial as a function in R and the function (16) corresponding to it. 
Under the assumptions just mentioned for R and the ¢, not only 7 but also 
I? is a transcendent over R in R[/], so that we can form the ring R[J*] of 
polynomials in /? over R, and then of course the elements of this ring are 
entire rational functions in R. The polynomial 1 + J? is then the function 
u— 1-4 u*, whereas (16) assigns to it the function u— 1 + u, since we 
may seta) = a, = 1,n = 1. 

This sharp distinction between a polynomial and an entire rational 
function is a necessary one from the logical point of view, but it is often 


14 If the domain of definition of the function (16) is restricted to R, then in general 
the uniqueness of this correspondence is lost and can be restored only under certain 
special assumptions, e.g., that R has infinitely many elements and no divisors of zero 
(see §1.1, §1.2). 


4 Polynomials 301 


disregarded in the various branches of mathematics; in many cases only 
one of the two concepts is actually needed but both names are used for it. 
There are historical reasons for this practice. Originally the word 
‘‘polynomial’’ denoted any expression with several terms, and then more 
particularly an expression of the form ay + a,x + °:: + a,x" in powers of 
a variable x. Now it is common practice in analysis to use the word 
“function” not only for the function (= mapping) but also for its value 
at the point x (and similarly for the case of several arguments). Thus it 
became customary in analysis to use “polynomial” and “entire rational 
function” as synonyms, and this practice can be justified on the basis of 
our definitions, provided we make a strict distinction between a function 
and its value; for in analysis the coefficients are taken either from the field 
of real numbers or of complex numbers, and thus, since each of these fields 
contains infinitely many elements, every entire rational function can be 
interpreted, as remarked above, as a polynomial in J, where J is the 
identical mapping x —> x. 

It was Steinitz, in his fundamental work [la] of the year 1910, who 
first introduced the precise concept of an indeterminate as an element that 
is transcendental over the domain of coefficients. What we call a “poly- 
nomial’’ is called by him an “entire rational function of the transcendent 
x’; in our present language it would have been more precise to call it the 
“value of an entire rational function at the point x.” In a textbook [1] 
published in 1926, H. Hasse distinguishes between an “entire rational 
function in the sense of analysis” and an “entire rational function in the 
sense of algebra’”’ (i.e., polynomials in our nomenclature). In the later 
textbooks on algebra (e.g., van der Waerden [1], Haupt [1]) the word 
“polynomial” is used exclusively for expressions dy + °** + @,x" in a 
transcendent x (over the ring containing the a,), but with the remark, 
in concession to the older usage, that the words ‘‘entire rational function” 
are also used. Finally, in Bourbaki [3] a polynomial is clearly distinguished 
from an entire rational function by the notation itself, although a function 
of this sort is given the name fonction polynome which emphasizes the 
close connection between the two concepts. 


2.2. Zeros 


In this section R is a commutative ring with unit element | and without 
divisors of zero. Then for a, ~ 0, b,, 4 0 it follows at once from (15) that 
QnbDm(~0) is the coefficient of x"+™ and is at the same time the leading 
coefficient of }7.5 a,x* )°”, 5,x*. Consequently, the product of two poly- 
nomials 4 0 is also ~ 0, and its degree is the sum of the degrees of the two 
polynomials. In particular, R[x] has no divisors of zero. By complete 
induction on the number of factors it is easy to extend this theorem to 
products of several polynomials. 
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By a zero! of the polynomial f(x) € R[x] we mean a zero of f in any 
extension ring of R[x].1° For a zero a of f(x) it follows from (4) that 
T(x) = (x — «)f,(x). This equation naturally raises the question: for 
which natural numbers m do we have 


(17) F(x) = (& — a)" g(x) 


for a g(x) (dependent on 7) with g(x) e R[x]? If f(x) = 0, then m may of 
course have any value, but if f(x) 4 0, our discussion shows that the 
degree of f(x) must be equal to the sum of and the degree of g(x), so 
that m < degree of f(x). Thus, for f(x) ~ 0 there exists a greatest natural 
number m with an equation (17); this number is called the multiplicity of 
the zero «. It can also be characterized by (17) and g(«) 4 0. For on the 
one hand, if g(a) = 0, it follows from (4) that g(x) = (x — «) g,(x) and 
thus by (17) that f(x) = (x — «)™*1 g,(x); and on the other hand, from 
(17) and f(x) = (x — «)™ h(x) with m' > m we have 


(x — a)™ (g(x) — ( — a)" h(x)) = 0, 


and also from x—a=+0 and the absence of divisors of zero,}’ 
g(x) = (x — «)™’-™ h(x) and consequently g(«) = 0. Thus: 

If the nonzero polynomial f(x) has s distinct zeros a , ..., «; with multipli- 
cities m,, ...,M,, then there exists a polynomial h(x) with 


(18) f(x) = A(x) I] (x — a,)™. 

The proof is by complete induction on s. For s = 1 the result (18) follows 
at once from the definition of m,. Now let us assume (18) and let a be 
another zero (4 o,,...,a;) of f(x). Since R has no divisors of zero, it 
follows that « is then a zero of h(x), so that h(x) = (x — «)™ g(x), 
g(a) 40, m S> 1. Setting g*(x) = g(x) IL, (« — «,)™, we then have 
S(x) = (« — a)™ g*(x), g*(a) 4 0, so that m is the multiplicity of a as 
a zero of f(x). Substitution of (x — «)™ g(x) for A(x) in (18) then provides 
the statement necessary for the induction. By comparing the degrees on 
both sides of (18) we obtain a sharpening of the theorem in §1.2: 


18 Instead of “‘zero” the word “‘root” is often used. This meaning of “‘root’’ is of 
course different from the concept of an nth root in the field of real numbers (IB1, §4.7). 
The connection between the two concepts lies in the fact that the mth root of a is a root 
of the polynomial x” — a. 

18 We could write R here in place of R[x], but then we would have to change our 
notation, since then the function (16) formed for a polynomial in an extension ring of R 
no longer determines the polynomial uniquely in every case. 

17 Here and in the preceding discussion we could dispense with the postulate of 
absence of divisors of zero, since the highest coefficient of x — a is equal to 1. 
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The sum of the multiplicities of the zeros of a nonzero polynomial is not 
greater than its degree. 


In the notation of §1.3 it follows readily from (7) that the multiplicity 
m of a zero « is characterized by the equations /,(«) =0 for 
0<i<m, fn(a) 40. In particular, a is a multiple zero (i.e., m > 1), 
if and only if f(«) = f,(a) = 0. Instead of f(x) it is often more convenient 
to make use of the derivative f’(x) of f(x), defined as follows:'8 


(19) f') =D iayx!; 


of course (since R does not necessarily contain the natural numbers), ia; 
is to be interpreted here as }°}_, a; (i.e., as the sum of i summands, each of 
which is =a,). If in the definition (5) of f,(x) we now replace « by x, then 
Ff, (x) becomes /’ (x), as is easily shown by changing the order of summation 
in (5); in particular, we thus have f,(a) = f(a). It is obvious from (19) 
that the derivative of a sum of polynomials is equal to the sum of their 
derivatives. The rule for the derivative of a product, namely 


(20) WO) =SO) 8) + fs’), — if A) = SX) 8), 


can of course be proved from (19), but we shall give a simpler proof in 
§3.1. By complete induction on nv we then obtain from (20) 


20) s@=LE@Ie®, if fo) =] ec. 


k=1 ttk i=] 


In particular, if we set g,(x) = x — a, , then from (20’) we have 


f(x) = » Il (x — a). 
k=1 itk 
If R, and thus also R[x], has no divisors of zero, we can form (x-— «,)~4 
in the quotient field R(x) (see §2.3) and then obtain 


£0) =f) ¥ @& — ay 


k=1 


2.3. Polynomials in Several Indeterminates 


We again let R be a commutative ring with unit element |. As a 
generalization of the concepts in §2.1, the elements x,,...,x, of an 


‘8 In this definition of the derivative no use is made of the concept of a limit, but if R 
is the field of real numbers, then the definition of /’(x) by the usual limiting process 
produces exactly the same result as here. 
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extension ring R’ of R are now called independent transcendents, or also 
indeterminates, if an equation?® of the form 


(21) Y a. xox = 0 (a 


zretty Pyeeedy 


€ R) 


O<tpgm 


implies a;,.;, = 0 for all indices. Every subring of R’ which contains R 
and all the x,; also contains all expressions of the form on the left-hand 
side of (21), and it is easy to show that these expressions again form a ring. 
This ring is called the polynomial ring R[x, , ..., Xn] in the indeterminates 
X1, ++, X, over R and its elements are called polynomials” in the x, , ..., Xn 
over R. For n = | this is obviously the definition in §2.1. For n > 1 we 
have 


(22) REX, +5 Xn] = RE, -) Xn) Xn. 


To prove this statement we denote R[x, , ..., X,_,] by S, and R[x,, ..., Xp] 
by T. Then obviously SC T. From 3°, u;xi, = 0 (u, € S), if we express 
the u,; in terms of the x,,..., X,-1 , we obtain an equation of the form (21), 
whose coefficients are thus all = 0, which means that u; = 0. Consequently, 
x, is an indeterminate over S, so that in T we can form the polynomial 
ring S[x,]. Conversely, for 


u= > Q, ug, Xin = GE = 0, ..., m) 


O<ipgm 
we at once obtain 


> ie a ae Y ux,', u,EeS 
0<i,<m im0 
and therefore T C S[x,], which completes the proof of (22). 

By (22) we have reduced the construction of a polynomial ring in several 
indeterminates to the successive construction of polynomial rings in one 
indeterminate. Thus for every R and for every natural number n there 
exists a polynomial ring over R in n indeterminates. 


The a; ,; in 
lesthy 


(23) VS > Op gtr nie: SE RK an: X 5) 


are called the coefficients of the polynomial y. They are uniquely deter- 
mined by y, since a second representation (23) would lead, when subtracted 


19 Strictly speaking, we should also write k = 1, ..., below the sign of summation; 
this summation is taken over all n-tuples (i,,..., in) withO << ip << m(k = 1,..., n). 

20 The same term is sometimes used (see, e.g., IB5, §3.9) even when the x; are not 
independent transcendents. 
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from the first one, to an euaton of the form (21). For each coefficient 
re F 0 in y the number i, + --- + i, is called the degree of the term 
a, ; iy - x'n, For y ~ 0 the maximum of the degrees of the individual 
terms (with coefficient + 0) is the degree of y in the x,, ..., xX, . This degree 
is to be distinguished from the degree of y in x; , which is defined as the 
degree of y as a polynomial in x; over the ring of polynomials in the other 
indeterminates. Thus x,x} + x, is of degree 3 in x, , x,, x3, of degree 2 
in x,, and of degree | in x, and x,. If all the terms of y (with coefficient 
+ 0) have the same degree, then y is said to be homogeneous. Thus the 
homogeneous polynomials of degree | have the form >, a;x; . 

As in §2.1 for the case n = 1, we now assign to each polynomial 
ye R[x,,...,x,] a function f of m arguments in an extension S of 
R[x, , ..-, X,] by defining /(u,, ..., u,) for u,, ..., u, € S as the element that 
arises on the right side of (23) by substitution”! of u, for x; . In particular, 
we then have y = /(x,, ..., X,), so that this correspondence is one-to-one. 
If u,,..., U, € R, then f(u,, ..., u,) is also in R, so that if we restrict the 
domain of definition, f becomes a function of m arguments in R. The 
functions defined in this way are called the entire rational functions of n 
arguments in R. 

By complete induction on n it follows from (22), in view of the theorem 
at the beginning of §2.2, that if R has no divisors of zero, then R[x, , ..., Xn] 
has none either. Thus if R has no divisors of zero, then by IB1, §3.2 we 
can form the quotient field of R[x, , ..., X,], whose elements are therefore 
of the form y/z with y, ze R[x,,..., Xn], z 0. A quotient field of this 
sort is denoted by R(x, , ..., X,). In view of the close connection between 
polynomials and entire rational functions, this field is usually called the 
field of rational functions over R, although its elements, being defined as 
quotients of polynomials, are in general not functions. But of course, to 
every element /(x, , ..., Xn)/2(1, ..., Xn) of R(x, , ..., x,) there corresponds 
a rational function R, 


(uy grees Un) —> f(uy gases u,)/g (uy grey U,,); 


its domain of definition consists of the n-tuples (u,,...,u,) with 
Uy, +) U, ER, g(uy,, ..., Un) 4 0, and its values are in the quotient field 
of R. 

The remarks at the end of §2.! about the terminology ‘“‘polynomia!”’ 
and “entire rational function” apply equally well here to the case of 
several transcendents and to the terms “polynomial quotient’? and 
“rational function,” which again are often used synonymously, as is clear 
from the choice of name for the quotient field of a polynomial ring. A 


1 As in the case n = 1, this substitution is possible even if S is only an extension of R 
and not necessarily of R{x,, ..., Xn]. 
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striking result of this practice is the fact that the elements of an algebraic 
extension field (see IB7, §2) over R(x,,...,X,), where R is a field, are 
called algebraic functions over R. Of course, these functions in the sense 
of algebra are not necessarily functions at all; in particular, they are 
certainly not algebraic functions in the sense of the theory of functions of a 
complex variable (see III6, §5). On the other hand, the latter functions can 
always be regarded as algebraic functions in the sense of algebra. 


2.4. Symmetric Polynomials 


An important special case of (18) arises when h(x) € R. Comparison of 
degrees then shows that )\7_, m; = n, where n is the degree of f(x). Since 
multiplicities will play no role in what follows, we index the zeros from | to 
nin such a way that the number of times each zero appears is equal to its 
multiplicity; we then have 


(24) F(x) = e(% — my) 1 % — om). 


By multiplying out on the right-hand side [see IB1, (41’)] we see that the 
coefficient a,_, of x"-* in f(x), with a, = c, satisfies the equation 


(5) a_,=(-—Die yy Oy Uy (i = 1,..., m). 


0<k<...<kjcn 


The notation under the summation sign indicates that the summation 
is to be taken over the set of all /-tuples (Kk, , ..., k;) with positive integers 
ky <n(h = 1,..., i) and ky < ky,, (A = 1,...,i — 1). This relationship 
between the coefficients and the zeros of a polynomial in the case (24) 
suggests that in the polynomial ring R[x, , ..., X,] (where R is an arbitrary 
commutative ring with unit element) we should pay special attention to 
the polynomials 


(26) ox, 9 09 Xn) = »y Xp eee 
where in particular 
0, (x, 9 e809 Xn) = xy + vee + x . On(Xy sana Xn) — xX nee Xi, ‘ 


Then (25) can be written in the form a,_; = (—1)*o;(a% , ..., &). Obviously 
o,(X, , ...) Xn) is homogeneous of degree 7, and has the further property, 
immediately obvious from (26), that it is left unchanged by an arbitrary 
permutation of the x,,..., x, . Polynomials with this property are called 
symmetric. From the uniqueness of the coefficients it follows that a 
polynomial! is symmetric if and only if each coefficient is left unchanged 
by an arbitrary permutation of the indices. Now the polynomials 
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a;(X,,..., X,) are of basic importance for all symmetric polynomials, in 
the following sense: 


For every symmetric polynomial f(x, , ..., Xn) from R[x,, ...,X,] there 
exists®* a polynomial F(x, , ..., Xn) in R[x, , ..., Xp], Such that 


I(x 9. nee Xn) = F(o,(% aa | Xe) ay) On (X4 9 reg Xn))- 


For this reason the o;(x,,...,X,) are called the elementary symmetric 
polynomials®® in the x,,...,X,, and the above theorem is called the 
fundamental theorem of the elementary symmetric polynomials; this theorem 
states that every symmetric polynomial can be expressed as an entire 
rational expression in the elementary symmetric polynomials. 

For the proof we choose a natural number g > 1 and confine our 
attention to the symmetric polynomials f(x, , ..., X,) + 0 of degree < g. 
As indices for the coefficients we will then have only n-tuples (i, , ..., in) 
with 0 <i, <g (k = 1,..., n). By the mapping 


n 


(i, 9 tee in) > y i,g”—* 


k=l 


this set of n-tuples is put into one-to-one correspondence with the set of 
non-negative integers < g”; the image of an n-tuple will be called its 
numeral.?4 Let the greatest numeral of an n-tuple (i, ..., in) witha; , 0 
in f(x, ...; X,) be denoted by A. By the principle of complete induction 
(see IB1, §1.4) it is sufficient to prove the assertion for f(x, ..., x,) under 
the assumption that it is already known to be correct for all symmetric 
polynomials whose nonzero coefficients have an index numeral < A. 
If (i,,...,i,) is the n-tuple with numeral A, it follows that i, > ips: 
(k = 1,...,n — D), for if i, < i,,,, then the n-tuple arising from (ji, , ..., i,) 
by interchange of i, with i,,, would have a numeral > h, so that its 
coefficient would necessarily be zero, whereas in view of the symmetry of 
f(%1 , «+5 Xn) this coefficient is = a, ., 4 0. Abbreviating o,(%,, ..., Xn) 
to a, , we now write down the obviously symmetric polynomial 


n-1 
(27) FC SHI Os ey) She IT ott tenrotn, 


22 1t can be proved that there is ‘“‘exactly one” such polynomial. See, e.g., 
van der Waerden [2], §29. 

*3 The entire functions corresponding to them (see the end of §2.3) are called the 
elementary symmetric functions. 

4 It is obvious that in this enumeration the n-tuples are arranged in lexicographic 
order (see IBI, §4.1). 
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Since o;, is of degree k and 


n-1 n 
> Ki — ings) + nin = » <8, 
k=l k=1 


the polynomial f*(x,,...,X,) is also of degree <g. In a product of 
o,-factors we can find the term ~ 0 with greatest index numeral by 
choosing, for each factor o; in (26), the summand with least indices 
k,, ..., k; (in other words, x, -:: x,;) and then multiplying these summands 
together. For the product subtracted in (27) this rule gives 


n-1 
Q; in I] (x, ss X,,)iete+1(X, aM x,,)n; 


or, in other words, in view of )°t7} (ix — isi), precisely the term 
Q;,...i,Xy! °** Xf. Thus every coefficient 4 0 in f*(x,, ..., Xn) has an index 
numeral < hh, so that the induction hypothesis can be applied to this 
polynomial: f*(x,, ..., Xn) = F*(o;, ..., %,). Then from (27) the desired 
result follows at once for f(x, , ..., X,). The proof provides a method for 
actually calculating the F, for example:*5 


y= xt xy +5, 

y—o 

= —3xix, — 3x3x, — 3xix, — 3x,x} — 3x—x3 — 3xgxi — 6x, x—X3 = Z, 
Z + 3040_ = 3X4X_X3 = 303, 


y = oe — 300% + 305 . 


This procedure also enables us to solve the following problem: under 
the assumption (24) with c = | and with a given entire rational function 
g of one argument in R it is required to calculate the coefficients of the 
polynomial []/_, (x — g(«;)) in terms of those of f(x). For this purpose 
we represent the symmetric polynomial o,(g(xj), ..., g(x,)) in the form 
F,(o,, ..., %,) and then obtain the desired coefficient of x"-* in the form 


(—1)' F(—@y_1, «.., (— 1)” ap). 


2.5. Power Series 


In analysis, an important role is played by power series (or sums of 
power series) in the variable x; that is, by expressions of the form )°f._, a,x*. 


25 A simpler method for this case is given at the end of §2.5. 
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Their addition and multiplication proceeds, provided x lies inside the 
circle of convergence, according to the formulas: 


Y a,x" + Y b,x" => Y (ay + b,) x": 


k=0 k=0 k=0 
(28) 
[oe] a «a k 
¥ a,x* y b,x* = Y ( y a:b...) xe 
k=0 k=0 k=0 © i=0 


A power series is determined by its sequence of coefficients, i.e., by the 
mapping k —a,, and conversely the sequence of coefficients is deter- 
mined by the power series (more precisely, by the corresponding function) 
if the radius of convergence is + 0 (cf. III7, §1.3). But in algebraic applica- 
tions of power series we are often interested, not in the numerical values 
obtained by inserting some x-value (from the circle of convergence), 
but only in the sequences of coefficients and their combinations according 
to (28). Then the restriction to real or complex numbers and the 
consideration of questions of convergence is not only superfluous but 
even troublesome, since it is frequently convenient to deal with a sequence 
as though it were the sequence of coefficients of a power series in a ring 
that is not a subring of the field of complex numbers. Just as in the 
construction of the polynomial ring in §2.1, it is desirable here to calculate 
with the sequences themselves. In order to be able to divide by power 
series, we must also consider series of the form >°?_, a,x* (with negative 
integer 4), for which the equations (28) must be slightly generalized. 

These remarks suggest the following definitions. Let R be a commutative 
ring with unit element 1. We consider the mappings a of the set of integers 
into R (with a, as the image of k in R**) and confine our attention to 
mappings a with the property that there exists an integer / with a, = 0 
fork <h. In the set R* of these mappings we can define an addition and 
multiplication, corresponding to (12), as follows: 


k-h 


(29) (a + dD) = ay + by ; (ab), = > CF 


t=h 


if a, = b, = O for all / < A. Then (cf. IBI, §1.6) fork > k — h, ork > 2h 
the sum = 0, and, in general, this sum is independent of h, provided only 
a, = b, = 0 for all | < h. Now, exactly as for R’ in §2.1, it is easy to see 
that R* is a commutative ring. To each ac R we assign the mapping 
a with a = a, a, = 0 for k ~£0, so that R is mapped isomorphically 
into R*. Thus we may set a = @ and @ = a, whereby R* becomes an 


*8 ‘We shall use this notation below, even when the mapping is not denoted by a single 
letter. 
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extension ring of R. Letting x denote the mapping which sends | into 1! 
and all other integers into 0, we see that x* (i > 0) is the mapping which 
sends iinto | and all other integers into 0. Then it is clear that >"7_, a,x‘); 
(i.e., the image of k under the mapping >""_, a,x‘ where it is to be noted 
that a; = 4a,)is equal to a, fork = 0, ..., n and is otherwise 0. In particular, 
Yio a:x' = 0 implies a, = 0, so that x is a transcendent over R. Thus 
we can construct the polynomial ring R[x] as a subring of R*. 

In the particular case that R is a field, the extension ring R* also contains 
the quotient field R(x) and is thus itself a field. For the proof of this 
assertion we first note that the mapping which sends —1 into | and other 
integers into 0 is the inverse x—1 of x with respect to multiplication. Now 
for a ~ Oin R* there exists by assumption an integer / such that a, 4 0, 
a, = 0 for k <h. With b = ax-” we then have b, = a,, b, = 0 for 
k <0. Thus it remains only to construct an inverse c for b. But by (29) 
we obtain such an inverse if we set c, = 0 for k < 0 and calculate the 
remaining c;, from the equations 


“ 1 for k=0 
2 bites = ee 


ie, for k = 0, 1, 2,... 


boeo = Vi. 
boc, + bycy = 0, 
bole + yey + bee = 9, 


Since b, ~ 0, it is obvious that this system of equations can be satisfied 
with elements cy, c, , ... € R, which completes the proof that R* is a field. 

To every element a of R* we now assign a rational number, which we 
call?’ the value |a| of a: |0| = 0, |a| = 2-", if a, 40 and a, = 0 for 
all k < h. Since 


(a —-»y a,x) = a, — ( » a;x') =0 
ixh k i=h k 
for all k <n and for every ae R*, we have 


n 
|a— > a,x? 


i=h 


ae 


27 This valuation has nothing to do with an ordering, as is the case, say, for the 
absolute value of the rational and the real numbers (cf. IB1, §3.4). We are only 
interested in the fact that the value here has the properties IB1, (52), (53), as is 
easily shown; see also van der Waerden [2], §§74, 75. 
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If now by means of this valuation we define the concept of a limit in the 
usual way, then a = lim,,. >}, 4;x?. Thus, as is customary in the 
theory of infinite series, we write 


foe) 
eS) axe 
n=h 


and call R* the power-series ring in x over R, or the power-series field, if R 
(and therefore R*) is a field. 

As a result of this construction of R*, the purely algebraic properties 
of power series (addition, multiplication, formation of inverses) can be 
investigated in a purely algebraic way, i.e., independently of the special 
properties of the field of real (or complex) numbers of analysis. But it must 
be pointed out that the above concept of a limit for R* does not coincide 
with the concept of a limit for real (or complex) numbers if the a; are 
such numbers and x is replaced by such a number. 

If R is ordered, the power series ring R* can be ordered in a very simple 
way. As domain of positivity we take the set of all power series a ~ 0 for 
which the leading coefficient a, (i.e., a, 0, a, =0 for k <A) is 
positive. The properties of a domain of positivity [IB1, (44)] are easily 
verified. Since all the positive elements of R obviously belong to the 
domain of positivity just defined, the order in R* determines the same 
order for the elements of R as they are assumed to have in the first place. 
In view of the fact that for every natural number n the element x? — nx*, 
with i < k, has the leading coefficient 1, so that x* > nx*, we see that the 
ordering is non-Archimedean (cf. IB1, §4.3), since x* is infinitesimal with 
respect to x?. 

As an example of the usefulness of these power series which we have just 
introduced in a purely algebraic way, we shall consider the problem 
(see §2.4) of expressing the power sums s,; = >"%_, x; (i = 1, 2,...) as 
entire rational functions of the elementary symmetric polynomials 
o;(X,,..., X,), for which we again write simply o,;. For an arbitrary 
commutative ring R with unit element 1, we construct the power-series 
ring in x over the polynomial ring R[x, , ..., x,]. By the definition of the o; 
the polynomial f(x) = TT? (1 — xx) over R[x, , ..., Xn] can be written 
in the form (with o, = 1): 


(30) f@=1+ y (—L) of xx, , ..., XXn) = y (—1)* o;x*. 


i=] i=0 


In the power-series ring we can now prove 
(31) ~f'() = $0) ¥ Siaxt 
t=0 


*8 Of course, this ordering does not lead (by IB1, (51)to the valuation introduced above. 
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For by the definition of f(x), of the s,,, and of addition as in (29), the 
right side of (31) is 


I] a - x,x)). 


I#k 


= (x,d — x,x) Yo xtxé 
1 i=0 
By (29) we have 
(l= ax) ¥ xix! = ¥ xi — yx Yt xt =, 
t=0 i=0 t=0 


so that for the right side of (31) we obtain the expression 
Siew Xx LLj¢x (1 — x;x) which is seen at once from (20’) to be equal to 
—f’(x). Then comparison of coefficients in (30) and (31) leads to 


Mon + ¥ (—1)¥ om—x5_ = 0 for m= 1,..., 7, 
kewl 
(32) 
YY (-) ons, = 90 for m>n. 


k=m—n 


In particular, form = 1, 2,3 and n > 3: 


O71 —— Sy = 0, 
20% =s 0715, + So = 0, 


305 — OS, + O71So = S3 — 0, 


SO that s, — O71, So — ot — 20,2, Ss = ot — 30,0, + 305. 
For n = 2, m = 3 the equation (32) becomes 


— 025, + 915, — Ss = 0, 


so that Sg = of = 3040, ° 


3. The Use of Indeterminates as a Method of Proof 


3.1. The Derivative of a Product 


The rules for the derivative of a product (20) can be proved most 
conveniently in the following way. Over the polynomial ring R[x] we 
construct the polynomial ring R[x} [u] (= R[x, u]). Then by the calculations 
leading to (4), (5) we can obviously arrive at an equation 


(33) f(x) — fu) = (« — u) Fx, u), 
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with F(x, u) € R[x, u], such that 


(34) St’ (x) = F(x, x). 


Now F(x, u) is uniquely determined by (33) (even without the assumption 
that R has no divisors of zero). For if F*(x, u) in R[x, u], considered as a 
polynomial! in u over R[x], has the leading coefficient c, then —c is the 
leading coefficient of (x — u) F*(x, u), so that the latter polynomial is 
#0, if F*(x, u) ~ 0. Thus f’(x) is characterized by (33) and (34). From 
the equation 


g(x) — g(u) = (x — u) G(x, uv) with g’(x) = G(x, x) 
corresponding to (33) a short calculation now leads to 
h(x) — Au) = (& — uF Cx, uv) (x) + fw) Gx, w)). 
For h(x) = f(x) g(x), this is the equation for h(x) corresponding to (33); 
so that the equation corresponding to (34) gives us the desired result 
h'(x) = F(x, x) g(x) + f(x) G(x, x) = f(x) g(x) + f(x) 8’ (x). 


It is to. be noted that by use of the indeterminate u we have created an 
exact proof out of the well-known faulty argument in which the difference 
quotient (f(x) — f(a))/(x — a) is first calculated as an entire rational 
expression in x and a under the assumption that x ~ a and then x = ais 
substituted into this expression. For the field of real numbers our definition 
of the derivative of an entire rational function leads to the same result as 
the usual definition of analysis by means of the limiting value of the 
difference quotient, a fact which follows at once from the continuity of 
the entire rational function F in (33). 


3.2. Determinant of a Skew-Symmetric Matrix 
Let 


A = (Qjx):,n=1,.... 


be a skew-symmetric matrix, i.e., 
(35) Qin = —Ay; (i, k= 1, Seay n). 


For the transpose®® AT = (ayi);.n<1,..... We thus have A? = — A. Formation 
of determinants then leads to | A7| = (—1)"| A|. For odd n we have 
| AT| = — | A |, so that 2 | A | = O in view of the fact that | A7 | = | A |. 


*® Cf. 1B3, §2.6. 
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Thus, if the a,, are numbers, it follows that | A | = 0. But in an arbitrary 
ring we may have 2a = 0 even for a 4 0; for example, in §1.1 we have 
noted that 1 + 1 = 0+ 1. Nevertheless, we can prove that even in an 
arbitrary commutative ring the determinant of a skew-symmetric matrix 
with an odd number of rows and columns is = 0, provided that in the 
definition of skew-symmetric we further require® that 


(36) ay = 0 (i = 1, ae n). 


For the proof we construct over the ring of integers the polynomial ring 
in the n(n — 1)/2 indeterminates x;, (1 <i<k <n) for an odd number 
n> 1. With 


(37) Xu = 0G = 1,..., n), Xin = —Xu Il Sk <icn) 


the matrix X = (x;;);.n<1,....n then satisfies the condition corresponding 
to (35), so that 2 | XY | = 0. From the fact that the polynomial ring has no 
divisors of zero and that in the ring of integers 1 + 1 + 0, it follows that 
| X | = 0. But by the definition of a determinant | X | is a polynomial in 
the x,, (1 <i<k <n). Thus the coefficients of this polynomial! are all 
= 0. If in an arbitrary ring R we have a matrix A satisfying (35), (36) and 
if we replace the indeterminates x;, (1 < i<k <n) by the a,, with the 
same indices, then X is transformed into A, in view of (35), (36), (37). 

But what happens to the polynomial | X |? In order to answer this 
question, we consider a polynomial /(x,, .... Xn) = Lo<i,<m C1 ..6 Xa! vos xin 
over the ring of integers and elements a,,...,a, from an arbitrary 
commutative ring R. For any natural number c and any ae R we shall let 
c:a denote the sum }5°¥_, a (i.e., the c-fold multiple of a); also we set 
0:a = Oand (—c): a = —c:-a.*! We then define 


f@s54) = Y er ae ah an. 
O<ipgm 


Now f(x, , «--s Xn) > f(a, , ...5 An) (for fixed a, , ..., d,) is a homomorphism 
of the polynomial ring into the ring R; that is, f(x, , ..., X,) + £(%, ..-5 Xn) 
goes into /(a@,,...,@n,) + 2(a,,.-,4n) [and therefore /(x,,..., X,) — 
2(X1,-.-, Xn) into f(a, , ..-5 Gn) — B(Q, ---» An)], and f(x, , ..., Xn) F(% 5 «+» Xn) 


30 In Bourbaki [2] a skew-symmetric matrix with this further property is said to be 
“alternating,” although etymologically speaking this word again refers only to the 
change of sign under interchange of indices. If R is such that 2a = 0 implies a = 0, then 
(36) is obviously a consequence of (35), since by (35) we have a, = —a,, so that 
24% = 0. 

31 If R contains the ring of integers, then obviously c - a = ca, Otherwise ca is not 
defined and as a substitute for it we simply take c - a. The only purpose of the dot is 
to call attention to this distinction. 
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goes into f(a, , ..., An) Z(G, ---) Gn), aS is easily proved from the formulas 
c:a+c':a=(e+c’):a, (c: alc’: a’) = cc’: aa’. The proof of the 
first formula, exactly as in IB1, (47), is based on the associativity of 
addition. The second formula is easily reduced to the case c, c’ > 0, when 
it can proved by simply multiplying out the factors [IB1, (41)]. 

If we now apply this homomorphism to | X |, we obtain | A | (since all 
additive-multiplicative relations remain unchanged), and on the other 
hand we also obtain 0, since the coefficients of | ¥ | are all =0. Thus 
| A | = 0, as desired. 


3.3. Determinant of the Adjoint Matrix 


For the matrix A = (4;,);,4=1,...r(” > 1) we denote the subdeterminant 
belonging to a;; by A,, or —A,,, according as i + k is even or odd (this is 
the usual notation; cf. IB3, §3.5.3). Then the adjoint A* = (Ajx)i.net,....n 
of A is such that A7A* is a diagonal matrix with every diagonal! element 
=| A |. Formation of determinants leads to 


(38) Aa Sale 


If the a,, are the elements of a commutative ring without divisors of zero 
(e.g., numbers) and if | A | 0, then by (38) we have 


(39) [AX [Salle 


But we can also prove this equation even when the a,;; come from an 
arbitrary commutative ring and the case | A | = 0 is not excluded. For 
just as in §3.2, let us construct the polynomial ring in the n? indeterminates 
Xi, (i,k = 1,...,m) over the ring of integers. For X¥ = (%iz)i,ne1,....n the 
determinant | X| is then a polynomial whose coefficients are not all 
= 0. Thus | ¥ | +0. But because the polynomial ring has no divisors of 
zero, the above discussion shows that | X* | = | ¥ |"-1, which means 
that all the coefficients of the polynomial | ¥* | — | X |"-! are = 0. As 
in §3.2, if we replace X by A, we obtain the equation (39) for an arbitrary 
matrix A with elements from an arbitrary commutative ring. 


*2 For noncommutative rings the concept of a determinant is of very restricted 
usefulness. 


CHAPTER 5 


Rings and Ideals 


1. Rings, Integral Domains, Fields 


1.1. The simplest example of a ring is the set of rational integers 
(IB1, §2) 


(G) 0, +1, +2,.... 


This set is closed with respect to addition, subtraction, and multiplication, 
by which we mean that the sum, difference, and product of two rational 
integers is always a rational integer. Furthermore, there are certain 
rules for calculation with these numbers, e.g., the familiar rules for the 
removal of parentheses and for the sign of a product. 

On the other hand, this ring is not closed with respect to division, since 
the quotient of two rational integers is not always a rational integer. 

There are many other examples of rings, e.g., the ring G[/] of Gaussian 
integers (IB6) consisting of all numbers a + bi, where i is the imaginary 
unit (see IB8, §1) and a,b are rational integers. The set of Gaussian 
integers is easily seen to be closed with respect to addition, subtraction, 
and multiplication, and the general rules for calculation with them are 
the same as for the rational numbers (IB1). 

Another important ring is the ring G[x] of the polynomials 


P(X) = Ay + AX + Agx® + ++ + ayx” 


of all possible degrees n = 0, 1, 2, ..., where the coefficients ay , a, , ..., Gn 
are rational integers (IB4, §2.1). Since we shall be dealing below with 
many other examples of rings, let us first give an exact definition. 
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1.2. Definition of a Commutative! Ring 

A (commutative) ring R is a non-empty set of elements a, b, c, d, ..., for 
which two operations, namely an addition and a multiplication, are defined; 
in other words, for any two elements a,be there exists a uniquely 
determined element cE ® which is the result of the addition 


a+b=e, 


and also a uniquely determined element de® which is the result of the 
multiplication® 


ab = d. 


These operations must satisfy the following laws (also called “‘axioms”’ or 
‘‘postulates’’): 


I. The associative law for addition and multiplication: for arbitrary 
elements a, b, ce R we have the equations 


(Ia) (a+ b)+c=a+(b6+4+ 0), 
(Ib) (ab) c = a(be); 


II. The commutative law for addition and multiplication: for arbitrary 
elements a,b & ® we have the equations 


(Ila) atb=b+a, 
(IIb) ab = ba; 


III. Invertibility of addition; i.e., subtraction is always possible: if a, b 
are any two elements of the ring ®, there exists a uniquely determined 
solution x of the equation 


(IIT) a+x=b. 


This operation, inverse to addition, is usually called subtraction, and the 
solution of (III) is written in the form 


(1) = b—-a, 


‘In the present chapter we confine ourselves to commutative rings, usually without 
explicit mention of the fact. If all the axioms except (IIb) are satisfied, the ring is said 
to be “noncommutative.” Important examples of noncommutative rings are the quater- 
nions (IB8, §3) and the general matrix rings (1B3, §2.2). The definition of a ring in IBI, 
§2.4 is formulated somewhat differently from the one given here, yet it is easy to see 
from the following discussion that the two definitions (apart from the assumption here 
that multiplication is commutative) are equivalent to each other (see also IB2, §2.4). 

* Multiplication is usually denoted, not by any special sign such as a dot, but merely 
by juxtaposition of the two elements. 
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so that for arbitrary elements a, b we have the identity 
(2) a+ (b—a)=5; 


IV. The distributive law: for arbitrary elements a,b, ce R we have® 
(IV) a(b +c) = ab + ac. 


It is easy to verify that all these postulates are satisfied by the examples 
given above. On the other hand, we must also show that these laws, which 
have been reduced to the simplest possible form, imply all the usual rules 
for calculation (IB1, §2.4). 


1.3. Remark on the Associative Law 


Up to now we have defined the sum of two elements only, so that if 
three or more elements a, b, c, ..., d are to be added, we may, for example, 
begin by adding the first two elements, and then the third and the fourth, 
and so on, and the order in which these operations are to be performed 
may be indicated by parentheses. But (Ia) states that in a sum of three 
terms it makes no difference how we place the parentheses, and thus we 
may simply omit them. The same result may be deduced for a sum of any 
number of terms, the proof being the same as in IB1, §1.3. Thus for such 
sums it is customary to omit the parentheses and simply to write 


atb+e+-4td. 


Furthermore, the commutative law (IIa) shows that in a sum of 
this sort we may permute the terms at will, without affecting the result 
(IB1, §1.4). 

From (15) and (IIb) it follows that the same remarks may be made for 
products of three or more factors. 

The distributive law can also be extended by induction to more than 
two summands and to products of factors in parentheses (cf. IB1, §2 (41)): 


(3) atb+ec++:+d)=ab+ac+: + ad, 
(4) (a+ bye+d)=a(c+d)4+ b(e +d) = ac + ad+ be + bd. 


1.4. The postulates (Ia), (Ia) and (IID) state that the element of a ring R 
form an Abelian group with respect to addition; this is the so-called “‘additive 
group” of the ring (IB2, §2.4 and IB1, §2.3).4 


3 In noncommutative rings, for which (IIb) is not postulated, we must make the 
separate postulate 
(6 + cla = ba+ ca. 


4 With respect to addition the ring is thus a module, by which we mean an additively 
written Abelian group (IBI, §2.3). 
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In view of the uniqueness (1) of its solution the equation 
(5) at+b=a+e 


implies b = c; in other words, we may cance! equal summands on both 
sides of an equation (first rule of cancellation). 
By III the equation 


(6) atx=a, 
where a is any element of the ring, also has a unique solution 
(6’) x=a-a. 


Now if } is any other element of the same ring, we may add the element 
b — a to both sides of (6), which in view of the identity (2) gives 


(6”) b+x=8, 
so that the same element x is the solution of (6) and of (6”). This element 
(7) a—a=b—b=0 


is called the zero element of the ring ®. For the time being we shall denote 
it by the letter 0, but later we shall also use’ the customary symbol 0. Then 
we have the identities 


(8) at+o=o+a=4, 
(8°) a—o=4, 


where (8’) follows from (2) and (8), if in (2) we put o for a and a for b. 
Thus the zero element is the “identity element” or “neutral element’ of 
the additive group of the ring. 

By (IID) the equation 


a+x=o0 
also has a uniquely determined solution o — a, which we abbreviate to —a: 
x=o0-a=-—a. 
Thus every ring element a has an inverse element —a satisfying the identity 
(9) / a+ (—a) = 0; 
5 The set consisting of the zero element alone satisfies all the postulates for a ring, 


but in general we shall assume that besides the zero element, which is always present, 
every ring contains at least one further element. 
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but since a—a=o, we may abbreviate a +(—a) to a—a. More 

generally, 

(10) b+ (—a) = b—a, 

since it follows from (Ia), (IIa), (9) and (8) that 
a+[{b+(—a)]=a+(-a)+b=0+5= 8, 


which implies the assertion (10) on account of the uniqueness of 
subtraction.® 

If to both sides of the equation (III) we add the element —x, it follows 
from (9) and (8) that 


b+(—x) =a, 


so that in view of (1) and the uniqueness of subtraction: 


(11) —(b—a)=a-b. 
1.5. We have 

(12a) b—a=d-ce 
if and only if 

(125) at+d=bee. 


It follows from the identity (2) and the equation (12a) that 
a+d=a-+[e+d—c)J=a+ce+6-ag=b+4+¢; 
and conversely from (125) and again from (2) that 
a+ce+(6-—-a=b+c=ad, 
so that by the rule of cancellation (5) 
c+(b6—a)=d, 


from which (12a) follows by the uniqueness of subtraction. 
We note the easily proved formulas 


(13) (—a)+d—c)=(6+4)—(C4+ 0), 
(144) 6—-—a—-—d—-ag)=6—-—a+(ec—d)=(6+0ec)-@+ 4), 
* In IB1, §2.3, the equation (9) is taken as the definition of —a@ and (10) as the definition 


of b — a, and equation III, which is here taken as the definition of b — a (=), is 
proved there as a theorem. The present equation (11) is proved there as equation (36). 
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which contain the usual rules for a change of sign under the removal of 
parentheses; since these rules are logical consequences of the postulates 
(I) through (III), they are valid in every ring.” Furthermore, if we multiply 
the equation 

c+(b-—c=b 


on both sides by a and apply the distributive law (IV), we have 
ac + a(b — c) = ab, 
from which by subtraction we obtain the following complement to (IV): 
(15) a(b — c) = ab — ac; 
finally, by repeated application of this formula 
(15’) (b — a)(d — c) = (ac + bd) — (ad + bc). 


1.6. Unity Element 
An element e of the ring ®, satisfying the relation 

(16) ea = ae=a 
for every element ae ®, is called a unity element (or a unit element, or an 
identity). In particular, 

e* = ee = @, 
Not every ring contains a unity element; e.g., the ring of even integers 
0, +2, +4, ... satisfies all the postulates for a ring but contains no unity 
element. But if a ring does contain a unity element, it cannot contain 
more than one; for if e’ were a second unity element, then by (16) we would 


have ee’ = e and also ee’ = e’, so that e = e’. In most cases we shall 
denote the unity element by the usual symbol 1. 


1.7. Divisors of Zero 
If o denotes the (always present) zero element of the ring R, we have 
(17) 0a = ao0=0 
for every element ae ®, as follows immediately from (7) and (15). But in 
7 In IBI, §2.1, the equation (13) for the natural numbers a, 6, c, d occurs in the defini- 
tion of addition of integers in the form (26), and (12b) occurs in the form (24) in the 


definition of equality of integers; in the present context we are chiefly interested in 
showing that these rules can be deduced from the axioms for a ring. 
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certain rings it can happen that a product is equal to zero even though 
neither factor is zero: 


ab = 0, with ao and boa. 
In this case the two elements a, 5 are called divisors of zero. 


Examples of rings with divisors of zero are provided by the residue class 
rings G/n, with composite n (cf. §3.7 and IB6, §4.1). Another example is the 
set of two-rowed matrices (IB3, §2.2): 


a b ; 
( ) with a,beG. 
ba 


By the rules for calculation with matrices, these matrices form a commutative 
ring whose zero element is the zero matrix 


( 0) 
0 0 
and whose unity element is the unit matrix 


(0 1} 


This ring contains divisors of zero, e.g., the two matrices 


1 ~—!1 1 1 
(7) ™ fy) 
~~! 1 1 1 
whose product is the zero matrix. 


1.8. A subset © of a ring R that is closed with respect to addition, 
subtraction, and multiplication satisfies all the ring postulates as a part of 
R and is thus called a subring of R; e.g., the set of all integers divisible by 
by 3, 0, +3, +6,..., forms a subring of the ring G of all the rational 
integers. 


As the following example shows, it can happen that the unity element of the 
subring G is different from that of R. Let R be the ring of all two-rowed diagonal 


ad 


matrices e 5 with a, b € G, and let S consist of all such matrices with b = 0. 


The unity element of ® is : 7) and that of © is ( =). Like G, the subring © 


has no divisors of zero, but R does have such divisors. 


1.9. Integral Domains 


The feature of greatest importance for the structure of a ring is the 
presence or absence of divisors of zero; in order to emphasize this feature 
with a special name, (commutative) rings without divisors of zero are 
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also called integral domains® (or domains of integrity). Most of the rings 
in the present chapter have no divisors of zero and are therefore integral 
domains, e.g., all the polynomial rings G[x]}, G[x, y), G[x, y, z], ... (ef. 
IB4, §2). 

In an integral domain we may cancel any nonzero factor that appears 
on both sides of an equation; that is, we have the second rule of cancellation: 


(18) ab = aac and ayo imply b= c.® 


The above rule can also be stated: An integral domain $ is a ring in which 
the solution of an equation 


(19) ax = b, a,beS, aoa, 


is unique, provided it exists at all. For if there were two distinct solutions 
xX: and x,, we would have ax, = ax, = b, so that a(x, — x.) = 0, which 
would imply the existence of two divisors of zero a # o and x; — x, # 0. 
On the other hand, if a ring has divisors of zero: ab = 0, a # 0, b # 0, then 
ax = a(x + b) for every element x. 


1.10. Fields 


It may happen that in the given ring 3 every equation (19) is uniquely 
solvable without exception; such a ring is called a field (see also IB1, §3.2). 
In a field, division (with the exception of division by zero) is unique and 
always possible. In other words, we may define a field as a set of elements 
in which besides the above listed postulates for a ring (1.2), the following 
postulate holds: 

Postulate for a field: In a field 8 every equation ax = b with a ~ o” but 
with otherwise arbitrary elements a,be€8 has a uniquely determined 
solution. 

It follows that every field is free of divisors of zero and has a unity element. 
The freedom from divisors of zero is proved in exactly the same way as 
for integral domains (§1.9), and the existence of a unity element follows 
from the solvability of the equation 


ax = a, 


For if x = e is a solution for some definite element a + 9, so that ae = a, 
and if 5 is any other element of the field, then by the postulate for a field 
there exists an element c satisfying the equation ca = b. If we multiply 


* In analysis they are usually called integral rings, since the word “domain” could 
easily lead to ambiguity. 

® Since a(b — c) = o and a is neither zero nor a divisor of zero, it follows that 
b6~—c=o0,sothatb=c. 

10 If a = 0, it follows from (17) that 6 = o, and then every element x is a solution of 
the equation ax = b, 
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by c on both sides of the equation ae = a, we obtain be = b, for every 
element b € &; in other words e is the unity element (1.6). 

By postulates (Ib), (IIb) and the field postulate, the elements of a field, 
excluding the zero element, form an Abelian group with respect to 
multiplication; this group is called the multiplicative group of the field 
(for the concept of a group see IB2). Conversely, the field postulate is a 
consequence of this property. Thus we have the following important 
result:44 Every integral domain with finitely many elements is a field; for 
example, every residue class ring G/p modulo a prime (p) is a field (IB6, 
§4.3). 

For the proof let x,, x2, ..., X, be the finitely many elements of the 
integral domain 3 and multiply them one after another by the element 
a(a ~ 0). By the rule of cancellation (18), the products ax, , ax2, ..., AXn 
are distinct and therefore represent all the elements of 3, including the 
element 5; thus we have a solution for the equation ax = b. 


If we do not postulate the commutativity of multiplication, then in order 
to obtain the analogue of a field, we must postulate that every equation ax = b 
and also every equation ya = b is solvable, provided only a ¥ o. (It is sufficient 
to postulate the existence of the solutions, since their uniqueness can then be 
proved.) The resulting system of elements is called a skew field. From the 
postulates it follows again that the elements of a skew field, excluding the 
zero element, form a group; so that the existence of an identity element and 
inverse elements a~', the absence of divisors of zero in a skew field, and the 
uniqueness of the postulated solutions can then be proved exactly as in group 
theory (IB2, §2.4). We can also show that every finite skew field (i.e., every 
skew field containing only finitely many elements) is necessarily commutative 
and is therefore a field. 


1.11. Prime Fields and the Characteristic of a Field 


The simplest and best known field is the field R of all rational numbers 
(IB1, §3.2). Next in simplicity are the above-mentioned residue class rings 
G/p, i.e., fields with finitely many (namely p) elements. Since these fields 
contain no proper subfields, we call them prime fields. The residue class 
rings are prime fields “of characteristic p’’ (§3.7), and the field R of rational 
numbers is a prime field of “‘characteristic 0.” 

We already know that every field 8 contains both a zero element o and 
a unity element e; but then & also contains the element e + e,e + e+e, 
and so forth, which we abbreviate to 2e, 3e ... (ne is the sum of nm summands 
e). Now it may happen that these elements are not all distinct but that 
some of the me being the same as some earlier ne: 


ne = me, so that (n—m)e= 0a. 


11 Cf, [B2, §2.4, where it is proved that a finite set with an associative operation (V), 
(A) and rules for cancellation (K,), (K;) is a group. 
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Then if p is the smallest natural number for which pe = 9, it follows that 
p must be prime!” and the elements a, e, 2e, ..., (py — l)e are distinct. In 
this case we say that the field R has “characteristic p’’; and otherwise, if 
all the elements e, 2e,...,ne,... are distinct, we assign to R the 
“characteristic 0.”’ 

The characteristic of an integral domain 3 with unity element e is defined 
in exactly the same way. 

It can be shown that every field 8 which is not a prime field contains as 
its smallest subfield a prime field of the above type with the same 
characteristic as 8. 


1.12 Quotient Fields 


For every integral domain % that is not a field we can construct a field, 
the quotient field, that contains 3 and is constructed from 3 in exactly the 
same way as the field R of rational numbers is constructed from the integral 
domain G of rational integers IB1, §3.2). We first construct the set of all 
formal fractions a/b, where a, b are arbitrary elements of 3 with b 4 0. 
In this set of fractions we introduce a partition into classes by means of the 
equality: 


alb=c/d if and only if ad = be. 


An element ain 3 is identified with the class of fractions ab/b. Computation 
with these fractions follows the well-known rules. The proof of these 
statements is to be found in IB1, §3.1-3.2; the argument given there, as 
was remarked at that time, can be transferred verbatim to the present case, 
since the only necessary postulate is that the domain 3 be a commutative 
ring without divisors of zero, i.e., an integral domain. 

The domain S constructed in this way is a field, the quotient field'* of 3; 
this field contains § as a subdomain, and the results of any computation 
in 3 remain valid for 8. 

The most important quotient fields are the field R of rational numbers, 
as quotient field of the integral domain G of rational integers, and the 
field R(x) of all polynomial quotients in one indeterminate x with rational 
coefficients. The field R(x) is a quotient field not only: of G[x) but also of 
R[x}. 


™ For if p were 6, say, there would be divisors of zero, since we would then have 
(2e)(3e) = (e + ee te +e) = 6e = 9, 
and the minimal property of p would mean that 2e ¥ 0, 3e # 0. 
*8 Two different integral domains may belong to the same (or isomorphic) quotient 


fields; e.g., the rational field R can also be obtained as the quotient field of the integral 
domain of all even numbers. 
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1.13. Isomorphism 


The concepts of isomorphism and homomorphism are defined (see also 
IB1, §2.4) for rings in the same way as for groups (1B2, §4.2 and §10.1). 
Two rings ® and ®* are said to be isomorphic, in symbols R & R*, if 
the elements a, b,c, ... of R are in one-to-one correspondence with the 
elements a*, b*, c*, ... of R*: 


awa", bo b*, c+>c* 


in such a way that this correspondence is consistent with addition and 
multiplication; i.e., if for all a, b we have 


a+ beoa* + bt, ab < a*b*. 


Since the postulates (§1.2) are satisfied in both rings, it follows that the 
correspondence is also consistent for subtraction and for more complicated 
expressions, e.g., 


a — b+ a* — b*, a(b +c) a*(b* + c*), ete. 


Thus, the zero element o of ® corresponds to the zero element o* of 
R*, the inverse element —a corresponds to the inverse element —a’*, 
a divisor of zero in ® corresponds to a divisor of zero in R*, the unity 
element of ® (if its exists) corresponds to the unity element of R*. If R 
contains a unity element, R* also necessarily contains one, and if R 
contains divisors of zero, then so does ®*; if R is an integral domain 
or a field, then so is R*, and conversely. 

If R, is a subring (that is, a subset closed with respect to addition and 
multiplication; see also 1B4, §2.1) of R, then the corresponding elements 
in R* form a subring R* isomorphic to R, : RK, SB KF. 

Isomorphism between rings is an equivalence relation (cf. IA, §8.5 and 
IB1, §2.2); every ring is trivially isomorphic to itself, and if two rings are 
isomorphic to a third, then they are isomorphic to each other. As far 
as their algebraic structure as rings is concerned, two isomorphic rings can 
differ only in the notation, so that we can identify them by disregarding 
all the properties of their elements that do not affect their structure as 
rings.!4 

If the rings R and R* contain a common subring ,, then R is said 
to be “isomorphic to R* with respect to R,” if every element of the 
subring R, corresponds to itself. 

If the mapping of the ring ® onto the ring R* is (possibly) many-to-one, 
so that each of the elements a, b,... of R has a unique image a™*, b*, ... 
in R*: 

a—a*, b— b*,..., 


144 In the example of §1.8 the two rings S and G are isomorphic. 
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and if every element d* in ®* has at least one (and perhaps more than one) 
preimage d in ®: d—d*, and if further the mapping is consistent with 
the operations of addition and multiplication: 


a+b—a*-+ b*, ab — a*b*, 


then R* is said to be a homomorphic image of R, or in symbols R SY R* 
(cf. §3.6). 


2. Divisibility in Integral Domains 


2.1 In the present section we consider integral domains 3 with a unity 
element 1; in particular, the integral domain G of all rational integers, 
the polynomial rings G[x] and R[x) of all polynomials in x with 
coefficients in G or R, and the field of rational numbers (1B4, §2). In an 
integral domain 3 the field postulate (§1.10) is not satisfied, in general; 
ie., the equation ax = b for given a, b generally has solution in 3; but 
in the special cases in which this equation does have a solution!® we say 
that b is divisible by a and write 


(19’) alb, or: “a divides 5.” 


We also say that a is a divisor, or a factor, of b. 

The relation of divisibility is transitive; i.e., from (19’) and )dlc it 
follows that alc. For by hypothesis there exist in 3 elements x and y 
such that 


ax =b and by=c, 


but then z = xy is a solution of the equation az =c, so that alc. 
Since la = a, every element a in 3 has the so-called “trivial” divisors 
1 and a.'6 


2.2. Units 


In the theory of divisibility in an integral domain 3 an important role is 
played by the so-called units, defined as divisors of the unity element; for 
example, the elements e, and e, are units if 


(20) Ce, = 1. 


In G, and also in G[x}, the only units are 1 and —1; in R[x] on the other 
hand, every polynomial of zero degree, i.e., all the rational numbers, are 


5 If the solution exists, it is unique (§1.9). By (17) every element a € 3 is a divisor of 
the zero element: a|o. But by a “divisor of zero” we mean only the elements defined 
in §1.7. Divisors of zero do not occur in integral domains. 

16 Concerning the properties of divisibility in an integral domain compare the 
discussion in I1B6, §1. 
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units.1” In the integral domain G[i] of the Gaussian numbers there are 
four units: 1, —1,i, —i, and similar remarks hold for other rings of 
algebraic numbers (IB6, §8). 

Thus the units are characterized by the property that they have a 
reciprocal element in 3; for it follows from (20), in the usual notation, that 


el= ~ =e6,€3 
and conversely. The product of two or more units is again a unit; for it 
follows from (20) and e,e, = 1 that (e,e,)(e,¢4) = 1, so that e,e, and ese, 
are also units. Thus the units of an integral domain form an Abelian group 
(IB2, §1.1) with respect to multiplication. 

Two elements a and a’ = ae of an integral domain 3% are called 
associates!® if each is the product of the other by a unit of 3. Associate 
elements a, a’ are characterized by the divisibility relations 


(21) ala’ and a’la, 


the second of which follows from a’e-! = a. Thus each of two associates is 
divisible by the other, and conversely. For by (21) there exist two elements 
x, y in 3 such that 


ax =a’ and a'y=a, 
from which we have a(xy) = a, and by the rule (18) of cancellation 
xy = 1; thus x and y = x7! are units, so that a and a’ are associates, 
as was to be proved. 
Every element a 4 0 in 3 has as its trivial factors all the units and all 
its associates; by a proper factor of a we mean a factor which is not an 
associate of a. 


2.3. Irreducible and Prime Elements 


An element a(0) in 3 irreducible if it has only trivial factors, i.e., if a 
factorization a = bc into two factors is possible only when one of the 
factors is a unit and the other is an associate of a. Otherwise a is said to be 
reducible. 

Examples of irreducible elements are: all the prime numbers in G, and 
all the irreducible polynomials in G[x]. But we must not confuse 
“irreducible” with “‘prime,’’ even though in certain integral domains, 


17 In a field every element is divisible by every other element (except 0). Since this 
statement is true in particular for 1, every nonzero element of a field is a unit. 

18 In particular, every element a is an associate of itself. 

19 It is convenient to include the units among the proper factors. 


5 Rings and Ideals 329 


including the most important ones (namely G, G[x], and R[x]), every 
irreducible element is prime and conversely, as will be proved below. 

An element p in 3 is said to be prime if p|ab implies at least one of the 
two relations p\a, p|b;* i.e., a product is divisible by a prime element p 
if and only if at least one of its factors is divisible by p.24 We now prove the 
following theorem. 


2.4. Every prime element is irreducible, but the converse is 
not necessarily true. 


For let p = ab be a factorization of the prime element p, so that p | ab; 
then one of the factors, say a, must be divisible by p. Writing a = pc, 
we obtain p = pbc, or 1 = bc, by cancellation (18). Therefore b is a unit 
and a is an associate of p, so that p = ab is a trivial factorization of p. 
Thus p has no nontrivial factorization, which means that p is irreducible. 


On the other hand, it may happen in certain integral domains that an element 
is irreducible but not prime. For example, in the integral domain G[V —3] of 


all numbers a + b V—3 with a,beG* the number 2 is irreducible, because 
the Diophantine equation (IB6, §7) 


2=(a+b V—3)(a — b V—3) = a + 36? 


clearly has no solution in integers. 
On the other hand, the number 2 is not prime in G[V —3], since the product 


aQ+V—30 — V—3) =4 


is divisible by 2, although neither of the factors 1 + V —3 and 1 — V—3 has 
this property. 


However, we shall prove below that in the integral domains of greatest 
importance to us, in particular in G,G[x] and R[x], every irreducible 
element is also prime, so that in these domains the two concepts may be 
identified. 


20 In the language of the theory of ideals (§3) we may say: an element p is prime if 
the ideal (p) generated by p is a prime ideal (§3.6). 

*1 The zero element of an integral domain 3 is not regarded as a prime element 
although, formally speaking, it is both irreducible and prime. 

2 ‘We can construct a simpler example in an integral domain without unity element. 
For example, let G, be the integral domain of all even numbers; then the number 30 is 
irreducible, because every product of two even numbers is divisible by 4. On the other 
hand, 30 is not prime in G,, since the product 6 - 10 = 60 is divisible by 30, although 
neither factor has this property. 
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2.5. Divisor Chain Theorem 


This important theorem, valid for all the integral domains considered 
here, will be needed in several places below. It states that a proper divisor 
chain 


(22) Qy , Ag, «+ Aj, ..- (a; = 0, Qi+1 | a; , i= I, 2, we) 


i.e., a sequence of elements a; in the given integral domain such that each 
a; is a proper divisor of its predecessor a,_, , contains’ only finitely many 
terms. 

The divisor chain condition holds in G, because the sequence of absolute 
values | a, |, | a@,)|,... corresponding to a divisor chain (22) consists of 
monotonically decreasing natural numbers, so that the smallest number 1! 
must be reached after a finite number of steps. 

The same result can be proved for a divisor chain (22) in the polynomial 
ring R[x) by considering, instead of the absolute value, the degree of the 
successive polynomials in x. To prove the result for G[x], we may consider 
the sum of the degree of the polynomial and the absolute value of its first 
coefficient. ”4 

If the divisor chain condition holds in an integral domain 3, every 
element a(~ 0) in 3 can be written in at least one way as the product of 
(finitely many) irreducible elements u, : 


(23) a= UU, iis Ug . 


For if a = u is irreducible, we already have a representation (23) with 
s = 1; but if a is reducible, we can split up each factor into irreducible 
terms, which must be reached after finitely many steps, since otherwise 
we would have a nonterminating proper divisor chain, in contradiction to 
the hypothesis. 


In general, the representation (23) is not unique, even apart from associate 


elements and the order of the factors. For example, in G[V —3] (see also §2.4) 
the number 4 can be represented in two essentially different ways as the product 
of irreducible numbers (in G[V —3]): 


4=2-2=(1+ V—3(1 — V—3). 


23 Thus G contains, among others, the following divisor chains beginning with 100: 
100, 50, 10, 2, 1 and 100, 20, 4, 2, 1. 
24 1t is not easy to construct examples of integral domains without the divisor chain 


condition; one example is the integral domain of all algebraic integers, containing the 
infinite proper divisor chain: 


V2, YB, 8/2, cary Dy os 
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But if in a given integral domain 3 every representation (23) is unique in the 
above sense, then every irreducible element in 3 is prime, so that the concepts 
‘irreducible’ and ‘“‘prime”’ coincide. For if 3 were to contain an irreducible 
element u that is not prime (§2.3), we could find an equation ua = bc with 
elements a, 6, c in 3 such that uw is not a divisor of 6 nor of c. But now we could 
derive a contradiction by splitting up the elements a, 5, c into their irreducible 
factors and noting that both sides of the equation must yield the same factoriza- 
tion, for then the irreducible element u must be an associate of an irreducible 
factor in the right-hand side, arising either from b or from c. 


2.6. Unique Factorization (u.f.) Rings 


Of great importance are the integral domains 3 in which every 
factorization (23) is unique. Such rings are called u.f. rings; in them we 
have the u.f. theorem (unique factorization theorem):*° Every element 
a(0) of a u.f. ring 3 can be expressed uniquely as the product of prime 
elements p, :°° 


(24) a= PiP,""' Ds, 


where uniqueness means that the prime elements p; in (24) are uniquely 
determined up to order and unit factors. 

We now prove the theorem: an integral domain 3 with unity element 
satisfying the divisor chain condition and the condition that every irreducible 
element is prime is a u.f. ring. 

Since the existence of at least one factorization (24) has just been shown 
to follow from the divisor chain condition, only the uniqueness remains 
to be proved. Let us assume that 


(24’) a= 492°" 4 


is a second factorization with prime elements qg,;; since the product 
9:92 °** dz is divisible by the prime element p, , at least one factor must be 
divisible by p, . We may assume that the order of factors in (24’) is such 
that p, | q, ; Since g, is also prime, it follows that p, and q, are associates: 
9, = ep, with a suitably chosen unit e. By setting (24) and (24’) equal to 
each otherand cancelling with p, we obtain the result: pop, ++: p, = 924s **** 9t- 
Proceeding in this way we find successively that q, is an associate of 
P2593 Of ps, ..., 9s of p,, and also that t = s, which completes the proof. 

In the quotient field of a u.f. ring, every fraction a/b can be written “in 
lowest terms,” i.e., in such a way that the numerator and the denominator 
have no prime factor in common; the reduction to lowest terms is 


2° If the factorization (23) is unique every irreducible element is also prime, as has 
just been shown, so that here the two concepts coincide. 
26 Of course, two or more of the prime factors p,; may be equal. 
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essentially unique, since it can be altered only by the adjunction of the 
same unit factor to numerator and denominator: a/b = ea/eb. 


2.7. Let a,b be two arbitrary elements in the integral domain $3. 
An element dé 3 which is a divisor of a and also of 5 is called a common 
divisor of a, b. We say that a, b are coprime if they have only the units 
as common divisors. 

An element d is called the greatest common divisor of a, b, or in symbols 


(25) GCD(a, b) = d, 


if d is a common divisor of a, b and every other common divisor d’ of 
a,b is a divisor of d: d'\d. The GCD¢(a, b,c,...,d) of the elements 
a, b,c, ...,d is defined analogously. 

It is obvious that if the greatest common divisor of two or more elements 
exists, it is determined only up to a unit factor. 


2.8. Euclidean Rings 

In a uf. ring it is easy (cf. IB6, §2.6) to read off the GCD(a, 5) from the 
“canonical factorization” (24) of a and 5b; but often these factorizations 
are not immediately known, and generally it is a very time-consuming 
task to determine them. Thus it is advantageous to have another procedure 
for finding the GCD, namely, the Euclidean algorithm, which can be 
carried out in certain rings, the so-called Euclidean rings, and leads very 
quickly to the GCD(a, b). Let us now discuss these concepts. 

An integral domain 3?’ is called a Euclidean ring if the division algorithm 
is available, i.e., if to every element a(~ 0) in 3 it is possible to assign 
a nonnegative integer H(a) such that 


(26’) H(ab) > H(a) forall a,b(40)in3 


and for any two given elements a, b(4 0) in 3 we can find elements qandr 
in 3 such that 


(26”) a=bq-+r with A(b)>HA(r) or r=0. 


The integral domains G and R[x] are Euclidean rings, since in G the 
correspondence H(a) = | a |, and in R[x] the correspondence H(a) = “‘the 
degree of the polynomial a,” satisfy the conditions (26’) and (26") (see 
IB6, §2.10); and for the same reason every polynomial ring R[x] over a 
field R of coefficients is a Euclidean ring.”® 


27 We need not assume the existence of a unity element in 3, because its existence, as 
we shall show in §2.10, follows from (26’) and (26”’). 

28 Certain rings of algebraic numbers, e.g., the rings G[j], G[V =e) G[¢] with 
¢ = (—1 + V —3)/2, are Euclidean (see IB6, §2.10). 
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2.9. Ina Euclidean ring it is possible to carry out the Euclidean algorithm, 
which consists of a certain repetition of the division algorithm (26"). 
Applied to two elements a, b( 0) of a Euclidean ring 3 this algorithm 
leads, as will be shown in detail in IB6, §2.10, to the result: 


(1) The last nonzero remainder is the GCD(a, b); 
(II) There exist in 3 two (coprime) elements x, y satisfying the equation 


(27) ax + by = GCD (a, b).2 


In particular, if the elements a, b are coprime, then GCD (a, b) = 1, 
and there exist two elements x, y € 3 satisfying the equation 


(28) ax + by = 1; 


conversely, it follows from such an equation that a, b are coprime. The 
elements x, y can be calculated by the Euclidean algorithm, and every 
pair of the form 


x =x-+ be, y =y-—ac 
with arbitrary ce 3 is then a solution of (27) or (28). 


2.10 (1) Every Euclidean ring contains a unity element. 
(II) In a Euclidean ring the divisor chain condition (§2.5) is satisfied. 
(III) In a Euclidean ring irreducible elements and prime elements coincide 


(§2.3). 
Finally, it follows (§2.6) that every Euclidean ring is au. f. ring.°° 


Proof of (1). Let a(4 0) be an element of the Euclidean ring 3 (§2.8) 
for which H(a) has the smallest possible value, and let b € 3 be arbitrary. 
Then the division algorithm (26") applied to the pair b, a gives: 


b=aqtr; 


but here we must have r = 0, since otherwise H(r) < H(a), contrary to 
assumption. Thus 6b = aq; that is, every element be 3 is divisible by 
a. In particular, for b = a we have the equation a = ae or, after multi- 
plication by q, also b = be, so that e is the unity element in 3 (§1.6). 


Proof of (1). The fact that 3 cannot contain a proper divisor chain 
(§2.5) with infinitely many terms follows immediately from the lemma: 
if b is a proper divisor of a, then H(b) < H(a). 


2° If we expand a/b in a regular continued fraction (IB6, §3.1), the numerator and the 
denominator of the second-to-last approximating fraction, taken with suitable signs, 
form a solution y, x of equation (27). 

8° This theorem is also a consequence of the idealtheoretic theorems §3.3 and 3.4. 
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To prove the lemma, let a = bc and let the division algorithm (26"), 
when applied to 5, a, give 


= ag +r, 


where we must have r ~ 0, since otherwise a and b would be divisible 
by each other and would therefore be associates, contrary to assumption. 
Thus H(r) < H(a); on the other hand, it follows from 


r= b—aq = b(1 — cq) 


and from I —cqg+0 (since by assumption c is not a_ unit) 
that H(b) < H(r), by (26’) and therefore H(b) < H(a), as was to be 
proved. 


Proof of (Ill). Now let p denote an irreducible element of 3 and assume 
that the product ab is divisible by p, so that pq = ab. If pla, then the 
criterion in §2.3 for p to be a prime is already satisfied; thus we assume 
that a is not divisible by p and then prove that we must have p|b. But 
Pp, a are now coprime, since the irreducible element p has only associates 
and units as its divisors, and therefore by (28) there exist elements x, y in 
3 such that 


px +ay=1. 
If we multiply this equation by 5 and the equation 
pq—ab=0 


by y and add, we obtain p(bx + qy) = 5, so that p | b, as was to be proved. 

Since the important integral domains G and R[x] have already been 
proved to be Euclidean rings (§2.8),it now follows that they are also u.f. 
rings. The same result follows for all polynomial rings R[x] over a field & of 
coefficients. 


But the converse is not true, since there exist u.f. rings that are not Euclidean. 
An example is the ring G[x] of all integral polynomials in x which, as we shall 
prove in §2.14, is a u.f. ring but is not Euclidean, for if it were, we could apply 
the Euclidean algorithm to the coprime elements x and 2 and obtain an 
equation (28): 


xh(x) + 2k(x) = 1 with h(x), k(x) € GE], 
which leads to a contradiction, as is clear at once if we set x = 0. 


2.11. The Polynomial Ring 3{x] 


Let us now discuss, somewhat more generally, the polynomial ring 3{x] 
consisting of all the polynomials 


(29) S(X) = ay + Ax + Gyx® + +++ + a,x” 
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of degree n = 0, I, 2, ..., with coefficients a; € 3,21 where we shall assume 
that 3 is a u.f. ring. For example, G[x] is a polynomial ring of this kind. 
Since 3 is a u.f. ring, the greatest common divisor 


(30) GCD (a, , a, , «.-, d,) = d 


is uniquely determined up to a unit. To determine its value we carry out 
the canonical factorization (24) of the coefficients a; in the usual way and 
select all the common prime factors. If we then set 


a, = da¥, axe %, b= Ob aM, 


we have 

(31) f(x) = df*(x) with f*(x) = ax + akx + axx® + +--+ + atx" 
and 

(31’) GCD(a%, a*, ..., ax) = 1. 


A polynomial in 3[x] whose coefficients, like those of f*(x), have no 
common factors except the units is said to be primitive. Thus we have the 
result that every polynomial f(x) in 3{x] can be written uniquely (apart from 
units) as the product (31) of the GCD of its coefficients (30) and a primitive 
polynomial f* (x).°? 

Along with 3[x] we now consider the polynomial ring R[x] over the 
quotient field (cf. §1.12 and §2.6) & of the uf. ring 3. By the analogous 
argument we see that every polynomial f(x) in R[x] can be written uniquely 
(apart from units) as the product 


(32) f(x) = 5 f*@), 


where a, b are coprime elements of 3 and f*(x) is a primitive polynomial in 
SL]. 


31 For degree 0 we thus obtain all the elements of 3, so that 3 is a subring of J[x]. 
Under our present convention the zero element of the ring 3, like every other element 
in the same ring, has the degree 0. In certain other contexts, which we shall not discuss 
here, it is convenient to leave the degree of the zero element undetermined. If we wish 
to preserve the formula 

degree f(x) - g(x) = degree /(x) ++ degree g(x) 
for identically vanishing factors, we may set degree 0 = — 0 (cf., p. 361, ftn. 7). 


32 If nm = O, we set f*(x) = 1. 
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2.12. Theorem of Gauss for Primitive Polynomials 
The product of two primitive polynomials is again a primitive polynomial. 
For if the polynomial f(x) in (29) is primitive, and if g(x) is another 
primitive polynomial 
g(x) = by + byx + box? + +++ + by x™, 
then their product is a polynomial 
A(x) = fQX) B(x) = Co + yx + cgx® + + Cay ™ 
with coefficients formed as follows:3% 
Co = Apdo , 
Cy = Agb, + aybo, 
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Now let it be assumed that 7z is a prime divisor of all the coefficients 
Co. C15 +> Cnam 3 then since the polynomials f(x) and g(x) are both 
primitive, there exist indices j and k(0 <j <n,0 <k < m) such that 7 
is not a divisor of a; or b, but is a divisor of all the preceding ap, ..., aj 
and by, ..., b,, . Then, in contradiction to our assumption, the coefficient 


Chie = Agdgnn + Djs ea Hot + yb +o + jp nn1dy + jy 1Do 


is certainly not divisible by 7, since all the terms in this sum are divisible 
by 7, with the single exception of the term a,b; . Thus h(x) is primitive. 


2.13. The Polynomial Ring 3[x] Satisfies the Divisor Chain Condition (§2.5) 

For if an element f(x) = df *(x) written in the form (31) has a proper 
divisor g(x) = bg*(x), then 6 | dand g*(x) | f *(x), where the polynomials 
g*(x) and f*(x) are primitive and at least one of the divisions is proper. 
But the element d of the u-f. ring 3 has only finitely many divisors b, and 
the proper divisor g*(x) must be of lower degree than f*(x). Thus every 
proper divisor chain (22) in 3[x] is finite. 


2.14. A polynomial ring 3[x] ts a u.f. ring if and only if the domain of 
coefficients 3 is a u.f. ring. In particular, G[x] is a u.f. ring.34 
The condition is obviously necessary; in order to show that it is also 
sufficient, we need only prove, in view of §2.6 and §2.13, that every irredu- 
cible element in 3[x] is prime. Now an irreducible element in 3[x] is 
33 In order to avoid a troublesome listing of various cases, we shall assume that 


all a; with index j > n and all 5, with index k > m are set equal to zero. 
34 Note that G[x] is not a Euclidean ring (§2.10). 
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either independent of x (i.e., the representation (29) has the degree n = 0), 
when it is an irreducible and therefore prime element 7z of the u-f. ring 3, 
or else it is an irreducible and therefore primitive polynomial p* (x). 

In the first case, if the product f(x) g(x) of two polynomials in 3[x] is 
divisible by 7, i.e., if in the notation of §2.13 


m\|f(x)g(x) ~—oralso = 7 | bdf*(x) g*(x), 
it follows that m|bd. But 7 is prime in 3, so that one of the two factors 
b, d, say d, is divisible by a: m | d| f(x). Thus 7 is also prime in 3[x]. 
In the second case it follows from 
P*(x)\f@) a(x) or: p*(x) | bdf *(x) g*(x), 
in view of the fact that p*(x) and also the product f*(x) g*(x) are 
primitive (§2.12), that 
p* (x) | f*(x) g* (x). 


If we examine these divisibility relations (see §2.11) in R[x], which is a 
u.f. ring (§2.10), we see that one of the two factors, say f*(x), is divisible 
by the prime element® p*(x): 


S*(x) = p*(x) h(x) with h(x) € R[x]. 


By §2.11 we can write h(x) = a/b h*(x), with h*(x) primitive in 3[x], 
so that the above equation, after multiplication by 5, gives: 


bf *(x) = ap*(x) h*(x); 
thus we see from the theorem of Gauss (§2.12) that a = eb and 
J *(x) = ep*(x) h*(x), where e is a unit, so that f(x) = df *(x) is in fact 
divisible by p*(x) with a quotient in 3[x], as was to be proved. 


A slight generalization of this theorem leads to the result: not only the 
polynomial rings R{x] and G[x] in one indeterminate, but also the rings 


85 The polynomial p*(x) is prime, since it too is irreducible in R[x]. For if p*(x) were 

reducible in R[x]: 
a c 
P*(x) = P,&) PL), =P, ) = pris P(x) = qP2) 
with a, b, c, dé 3 and primitive polynomials p*(x), p¥(x) in S[x] of degree less than 
the degree of p*(x), we would have 
bdp*(x) = acp*(x) pX(x); 

but then by the theorem of Gauss (§2.12) we could write ac = ebd with a unit ee 


and p*(x) = epi(x) px(x), in contradiction to the assumed irreducibility of p*(x) in 
Six]. 
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R[x, y], G[x, y], Rx, y, z], G[x, y, z]... in several indeterminates are u.f. 
rings. The same result holds for all polynomial rings 3[x, y], 3[x, y, Z]... , 
provided 3 is au. f.ring. The proof follows at once from the remark that the 
polynomial ring 3[x, y] can also be regarded as a polynomial 3*[y] if we 
set 3* = 3[x]. But * is a uf. ring, so that 3*[y] = Sx, y] is also such 
a ring, and so forth. 


3. Ideals in Commutative Rings, Principal Ideal Rings, 
Residue Class Rings 


3.1. In IB2, §3 we saw that the inner structure of a group can best be 
determined by a study of its subgroups, particularly of its invariant 
subgroups; thus it is natural, in investigating the inner structure®® of a ring 
®, to examine its subrings. Here also subrings of a certain type will be 
distinguished, namely the ideals. 

Definition of an ideal. A non-empty subset a of a ring ® is called an 
“ideal” if it has the following properties: 


(1) the module property 

(33) a,bea implies a— bea; 
(II) the ideal property 

(34) aca and re® imply raecéa. 


The module property (I) implies that every ideal of a ring is a subgroup 
of the additive group of the ring (§1.4). To see this, we note that every 
ideal a contains the zero element a — a = 0 and thus by (33) contains 
the inverse —a = 0 — a of every element a in the ideal. But then, from 
a — (—b) = a + dit follows®’ by (33) that 


(33’) a,bea implies a+ bea. 


Similarly the ideal property (II) implies that a is a subring of ®; that is, 
a is closed not only with respect to addition (33’) and subtraction (33), 
but also under multiplication; for from a,bea it follows by (34) that 
ab ea. But (34) makes a stronger demand, namely that the product ab 
shall still lie in a even though only one of the two factors lies in a. Our 
reason for confining attention to the subrings distinguished in this way will 
become clear in our discussion of congruences and residue class rings (§3.6). 

The zero element by itself forms an ideal, the zero ideal, denoted by 
(0); in the same way the ring § is itself an ideal, the unit ideal. These two 


36 By the inner structure of a ring we chiefly mean the divisibility relations in the ring. 
37 Thus we can derive (33’) from (33), but not conversely. 
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ideals occur in every ring, and in a field they are the only ideals; for in a 
field any ideal a containing a nonzero element must contain all 
the elements of &, by the field postulate (§1.10) and the ideal property (34). 

From (33’) and (34) we can now derive the following important property 
of an ideal a in a ring R: 


(35) a,bea and r,sEeR imply ra+ sbea. 


3.2. More generally we can say: If an ideal a contains the elements 
a, , 4,,..., a,, then it contains all the /inear combinations 


(36) ryQ, +heQ. + -° +19,4, with r,,fo,..,7sER. 


Conversely, the set of all elements (36) forms an ideal, call it b, as can be 
verified at once from the conditions (33) and (34). The ideal b*8 is said to 
be generated by the elements a, , dy, ..., a, and is written: 


(37) b = (a, do, ..., a;). 


The elements a, , ay, ..., a, are said to form a basis for the ideal b, although 
it is not thereby asserted that the basis cannot be shortened, i.e., that the 
same ideal cannot be generated by fewer basis elements. 

A priori it is not clear whether every ideal in an arbitrary ring has a 
finite basis or not; nevertheless, in all the rings considered here it is true 
that every ideal has a finite basis. Such rings are said to satisfy the basis 
condition and are called Noetherian rings. 

Particularly important are the rings in which every ideal a has a basis 
consisting of a single element a = (a). In this case a is the set of all 
multiples ra(r € R) of a; in other words, the ideal comprizes the set of all 
elements in the ring ® that are divisible by a. Such ideals are called 
principal ideals, and if all the ideals of a ring are principal, the ring is 
called a principal ideal ring. 


3.3. Every Euclidean Ring is a Principal Ideal Ring 

For if 3 is a Euclidean ring (§2.8) and a is an arbitrary ideal (4 (0)) in 3, 
let a(4 0) be an element in a such that the function H(a) defined in §2.8 
assumes its smallest value. Then for any 6 in a the division algorithm 
(26"), applied to b and a, produces an equation b = aq + r. By (35) the 
element r is contained in a and, if it is not zero, satisfies the condition 
H(r) < H(a), in contradiction to the assumption. Thus r= 0 and 
b = aq with qe 3. Consequently, a = (a) is a principal ideal, as was to be 
proved. 


38 It is assumed here that the ring ® contains a unity element. The ideal b is a subideal 
of a, which may coincide with a; in symbols: 


aldb or ba. 


340 PART B ARITHMETIC AND ALGEBRA 


In particular, the rings G and R(x], which are Euclidean rings by §2.8, are 
principal ideal rings.5® 

We already know (§2.10) that every Euclidean ring is a uf. ring; 
somewhat more generally, we have the following theorem. 


3.4. Every Principal Ideal Ring Is a U.F. Ring 

For let 3 be a principal ideal ring, i.e., an integral domain with unity 
element, in which every ideal has a basis consisting of a single element; 
then by §2.6 we must prove that every irreducible element in 3 is prime 
and that 3 satisfies the divisor chain condition. 

The greatest common divisor (§2.7) 


(38) GCD(a, , a), ...,4,) =d 


of two or more elements is always a well-defined” element of 3, and d is 
the basis element of the principal ideal generated by the a; , 


(38’) (a, , a,,..., 4s) = (a). 


For on the one hand all the a, are divisible by d, and on the other d is a 
linear combination (36) of the a; and is thus divisible by each of their 
common factors; consequently, d is their greatest common factor. 

In particular, if two elements a,b in 3 are coprime, ie. if 
GCD (a, 6) = 1, then the principal ideal (a, 6) = (1) generated by them 
is the unit ideal, so that there exist elements x, y in 3 satisfying the equation 


ax + by = 1. 


It follows, exactly as in §2.10 (III), that in a principal ideal ring 3 every 
irreducible element is prime. 

Now let a, , a,,...,a;,... be a divisor chain (22) in 3, and form the 
set m of all elements of 3 that are divisible by any of the a; .“1 This set m 
is an ideal in 3; for if a, b are any two elements of a set m, then there 
exists a first a; in the divisor chain that is a divisor of a, and also a first a, 
that is a divisor of b. Let us assume that j > i; then a, is a common divisor 
of a and 6 and thus a divisor of a — b, so that a — be m, and m has the 
module property (33). The condition (34) is obviously satisfied, since 
together with a all elements ra(r € 3) are divisible by a; and therefore are 
contained in m. 


8® On the other hand, the integral polynomial ring G[x] is a Noetherian ring but not 
a principal ideal ring. 

‘0 Defined up to a unit factor; in (38) and (38’) d can be replaced by any of its asso- 
ciates. 

41 An element that is divisible by a; is also, of course, divisible by each of the 
subsequent a,41,.... 
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By hypothesis this ideal m has a one-element basis: m = (m). Since the 
basis element m is an element of im, it must be divisible by a first a, : a;,|m. 
If />k, then a,| a, , and since a,e im = (m), we also have m|a,, so 
that a,|mla,, or a, | a,; in other words, a, is an associate of every 
subsequent element of the divisor chain. Thus a proper divisor chain in 3 
cannot have infinitely many terms; in other words, the divisor chain 
condition holds in 3, which completes the proof that 3 is a u.f. ring. 


But the converse of this theorem does not hold; for example, the integral 
polynomial ring G[x] is a u.f. ring (§2.14) but not a principal ideal ring; for it is 
easy to see (cf. §2.10) that the ideal (x, 2) in G[x] is not a principal ideal. 


3.5. Congruences Modulo a 


Two elements a and a’ of a ring § are said to be “‘congruent modulo a,” 
or in symbols: a = a‘(a), if their difference a — a’ is contained in the 
ideal a: 


(39) a= a'‘(a) means the sameas a—a’e€a. 


In particular, the congruence a = O(a) means that the element a itself 
is in the ideal a. 

With these congruences we can compute in exactly the same way as 
with equations: for it follows from a = a‘(a) and 6 = b’(a) that 


(40) atb=a+0, a—b=a—0), ab = a'b'(a). 
To prove the first of these three congruences we note (cf. (13)) that: 
(a+b)—(a +b) =(a-a)+(6—-D)eEa. 
The proof of the second is analogous, and by (35) the third follows from: 
ab — a'b'’ = (a—a@)b+a(b— b’)eEa. 
The fact that this last conclusion requires the ideal property (34) explains 


the peculiar importance of ideals in the class of all subrings (cf. §3.1). 


3.6. Residue Classes 


The congruence relation just defined is an equivalence relation® (IA, 
§8.5, and IB1, §2.2) for the elements of the ring ® and therefore generates 
a partition of these elements into classes, called “residue classes” modulo a. 
Every residue class is uniquely determined by any element a contained in it, 
since together with a it contains all the elements a’ satisfying (39).® In 


42 It is easily shown that congruence is reflexive, symmetric, and transitive (cf. IA, 
§8.3 and IBI, §2.2). 

“8 Every element of the ring lies in exactly one residue class; two distinct residue 
classes have no element in common. 
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particular, the residue class containing the zero element coincides with 
the ideal a. 

By [a] we denote the residue class modulo a containing the element a and 
say that a is a representative of this class; any other element a’ of the same 
class will serve equally well as a representative. 

We now consider these residue classes [a], [5], [c], ... as new elements, 
for which addition and multiplication can be defined in accordance with 
the postulates in §1.2 for a ring, with the result that the residue classes 
form a new ring, the residue class ring R/a.44 The sum and the product of 
two residue classes [a] and [b] are naturally defined by 


(41) [a] + [6] = [a+ 5], [a] [6] = [a0]. 


This definition is unique, since the result remains the same (cf. (40)) for 
all other choices a’ and b’ of the representatives for the two classes. 

Computation with residue classes modulo a is essentially the same as 
with the congruences modulo a described in §3.5, where two elements 
congruent modulo a are regarded as equal to each other. Consequently, 
all the ring postulates (§1.2) are satisfied in the residue class ring, since they 
are valid for the elements of the orginal ring R. 


If to every element a in ® we assign the residue class [a] of the residue class 
ring R/a, we have a homomorphism KR 7 R/a in the sense of §1.13, in which 
all the elements of the ideal a are mapped onto the zero element of the residue 
class ring R/a. Thus every residue class ring R/a is a homomorphic image of 
the ring ®. 

Conversely, if a ring R* is the homomorphic image of R:R S R* there 
exists an ideal a in R such that the residue class ring R/a is isomorphic to 
R*:Rla GS R* (the homomorphism theorem for rings). The ideal a consists 
of the set of all elements of mapped by the homomorphism RK > R* onto 
the zero element of R*, where it is clear that this set is actually an ideal, since 
addition and multiplication are preserved under a homomorphism; i.e., if 
a and b are mapped onto the zero element in R*, then so is ra + sb,r,seER. 
We see that the elements of the residue class ring R/a are in one-to-one 
correspondence with the elements of R*. 


An ideal p whose residue class ring ®/p has no divisors of zero is called 
a prime ideal; thus a prime ideal is characterized by the property that a 
product ab is contained in p if and only if at least one factor a, b lies in p 
(cf. p. 340, ftn. 2). 


A primary ideal q is an ideal whose residue class ring R/q contains only 
nilpotent divisors of zero, i.e., elements which become equal to zero when raised 


“4 The residue class ring corresponds to the concept in group theory (IB2, §3.3) of 
a factor group with respect to a normal subgroup. 
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to a certain power. Thus we can characterize a primary ideal q by the following 
condition: 


(42) abeq and a¢gq imply bP eq 


for a natural number p. Setting p = 1, we have the condition for a prime ideal, 
so that prime ideals are special cases of primary ideals. To every primary 
ideal q there corresponds a prime ideal p consisting of all the elements of ® 
that lie in nilpotent residue classes modulo q. 

In Noetherian rings (§3.2) we have the general factorization theorem: every 
ideal can be represented (in an essentially unique way) as the intersection of 
finitely many primary ideals corresponding to distinct prime ideals. To a certain 
extent, this factorization theorem for ideals takes the place of the (no longer 
valid) u.f. theorem. Somewhat more special are the rings occurring in the 
theory of algebraic numbers; in these rings of the “classical ideal theory” 
we have the following factorization theorem: every ideal can be represented 
(uniquely, apart from the order of the factors) as a product of prime ideals 
(cf. also IB6, §8.2). 

The reader should compare the discussion in IB11, §3. 


3.7. Residue Class Rings G/n® 


The integral domain G of rational integers is a principal ideal ring 
(§3.3); every ideal (in this context often called a ‘“‘module’’) can thus be 
written as the principal ideal a = (n) consisting of all the multiples in G 
of the natural number n. In every residue class there is exactly one number 
between 0 and n — | (inclusive); this number may be chosen as the 
representative of the class. Thus there are exactly n residue classes 


[0], [1], [2], .... f — 1). 


With these classes we compute exactly as with integers, except that the 
result of a computation modulo n must be reduced to the smallest 
nonnegative remainder (cf. 1B6, §4.1). 

The residue class ring G/n is finite*® and has no divisors of zero if and 
only if m = p is a prime; as a finite integral domain (§1.10) it is then a 
field, namely the prime field of characteristic p (§1.11). 


3.8. The Integral Domain of the Gaussian Numbers G{i] 
as Residue Class Ring G[x]/p 


In the polynomial ring G[x] we now consider the principal ideal*’ 
(43) p = (x? + J); 


4° The notation G/n is often used in place of the more exact G/(”). 

“* A “finite” ring is a ring with only finitely many elements. 

*" This ideal p is a prime ideal, since the basis polynomial x? + 1 is irreducible in 
G[x] and is therefore prime (§2.14). In other words, a product f(x) g(x) of two poly- 
nomials in G[x] is contained in p (i.e., is divisible by x? + 1) if and only if at least one of 
the factors is already contained in p. 


344 PART B ARITHMETIC AND ALGEBRA 


if p(x) is any polynomial in G[x], division of p(x) by x? + 1 gives the 
equation 


(43’) p(x) = ? + I) q(x) + (ax +5), with gq(x)EG[x], a, beEG, 
so that 


(43”) p(x) = ax + d(p). 


So in every residue class modulo p there exists exactly one integral 
polynomial ax + b of degree < 1; and, of course, we may choose this 
particular polynomial as representative of its class. Computation with 
these classes, i.e., with congruences modulo p follows the rules: 


(ax+b)t(@’x+ 5) =S@ta)x+O+5)), 
(ax + b)(a’x + 5b’) = (ab’ + a’b) x + (bb' — aa’')(p); 


but exactly the same rules must be followed if we replace x by the 
imaginary unit / and write ordinary equations instead of the congruences 
modulo (p). Consequently, except for the somewhat different notation, 
the residue class ring G[x]/p is identical with the integral domain G[i] of 
Gaussian integers (§1.1); in other words we have the isomorphism: 


(44) Gli] & G[x/? + I). 


In exactly the same way we can show that the field C of complex numbers 
is isomorphic to the residue class ring K[x]/(x*? + 1), where K is the field 
of real numbers (see also IB8, §1.2). 


3.9. Residue Class Rings R[x]/f(x) 

Now let f(x) be any polynomial of degree > 1 in the polynomial ring 
R[x] over an arbitrary base field R, and consider the residue class ring 
K[x]/a of R[x] with respect to the principal ideal a = (f(x)), for which 
we also write R[x]/ f(x). 

If f(x) is of degree m in x, every polynomial p(x) in R[x] can be 
reduced, if necessary by the division algorithm (26”), to a polynomial of 
degree <n —1: 


(45) P(x) = f(x) a(x) + r(x), 
so that 
(45’) P(x) = r(x)(a), 


with r(x) = c, + cyx + cgx? + +++ + c,x"-1. So in every residue class 
modulo a there is exactly one polynomial r(x) of degree < m — 1 with 
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coefficients in S. The residue classes can be put in one-to-one 
correspondence with these polynomials. 

Computation with the residue classes is the same as ordinary 
computation with the polynomials r(x), except that the result must be 
reduced modulo f(x) whenever the degree exceeds n — 1. The calculations 
become simpler if we introduce the following notation for the residue 
classes 


(46) {j= e,, [x] = e., [x*] = e,,..., [x"-"] = en. 
Then the residue class represented by the polynomial 
r(x) = Cy + Cx +t + yx 
can be written as a Jinear form* in the e, 
(46’) CyOy + Ce@g + °** + Cren - 


It is clear how such linear forms are to be added, but for multiplication 
we need a multiplication table, which must be constructed by computing 
congruences modulo f(x). It is sufficient to calculate the result* for all 
products e,e; : 


(46") eyes = yer + visto too t+ vier, if = 1,-0.0 


When the coefficients y\; have been determined, it is easy to carry out 
all the operations (except division) on the linear forms (46’), and in each 
case the result is a linear form denoting the same residue class as would 
result from the same operations applied to residue classes. 

A ring consisting of linear forms (46’) provided with a multiplication 
table (46”) is called an algebra or a hypercomplex system. Thus we can 
now say: every residue class ring R[x]/f (x) is a (commutative) algebra over 
the base field &. 


3.10. Residue Class Rings as Field Extensions 
If 
(47) F(X) = dy + ax + agx® + + + ayx” (n > 1) 


is an irreducible polynomial in R[x], the residue class ring R[x]{f (x) is a 
field, which we shall denote by 2. 


48 A form is a homogeneous polynomial, a linear form is a linear homogeneous 
polynomial. 

4° By (46) the residue class e,e; contains x*t??; if i+ j7—-2<n-—41, we have 
simply ee; = e€4,5_,; otherwise, we must reduce x‘+/-2 by (45) and express the resulting 
r(x) by (46) as a linear form. 
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For if the polynomial p(x) is not divisible by f(x), ie., if the irreducibility 
of f(x) implies that p(x) and f(x) are coprime, then by (28) the Euclidean 
ring K[x] contains two polynomials h(x) and k(x) with 


P(x) A(x) + fF) k(x) = 1; 
and therefore, since [f(x)] = [0], 


[p(x)][h(x)] = [1], 


which means that every nonzero element in 2 is a unit (§2.2), so that 2 is a 
field. 

The field 92 contains a subfield R* isomorphic to R, where R* consists 
of all the residue classes that contain an element of 8. Letting the element 
ae §* represent its residue class [a], we see that the correspondence 
a+» [a] is an isomorphism (§1.13), so that we can identify the isomorphic 
fields R and K* by setting their elements equal to each other. For the 
residue classes [a] in R* we write simply a, as may be done without fear 
of ambiguity. 

_If we now denote by f(X) the polynomial (47) in the new indeterminate X 


(47’) I(X) = ay + aX + aX? + + aX", 


it is easy to see that in the field 2 this polynomial has the zero [x], since in 
the residue class ring $2 


F(x) = a9 + ale] +o + anlx]” = [aq + yx ot yx" 
= [f@)] = [0]. 


To sum up: a polynomial f(X) irreducible®' over the field & has at least 
one zero in the residue class field R[x]/f(x), which may be regarded as an 
extension field of 8. 


4. Divisibility in Polynomial Rings Elimination 


4.1. The process of deciding whether a given element in a uf. ring is 
reducible or irreducible is usually very time-consuming, if it is possible 
at all. Even in the very simple u.f. ring G of rational integers the only 
practical way of deciding whether a given number is prime or not is to 
consult a table of primes (provided the given number lies within the range 
of the table).52 So we must expect that in complicated u-.f. rings, partic- 


50 Two distinct elements in R cannot be contained in the same residue class modulo 
f(x). 

51 “Irreducible over R”’ means “‘irreducible in R[x].” 

52 A famous example for the difficulty of recognizing a prime is the number 
232 + 1 = 4294967297, which Fermat (1601-1665) considered to be prime; but 
Euler (1707-1783) discovered that in fact it is composite: 641 - 6700417. 
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ularly in the polynomial rings G[x], R[x], G[x, y],.... this question 
will not be a simple one. Of course, we can show, just as for the ring G, that 
in principle the question can be decided in finitely many steps, which it 
may be possible to simplify by more or less ingenious devices; but even 
then the actual process will usually require far more time than can be 
devoted to it. So we must content ourselves here with some useful lemmas 
and a few special criteria for irreducibility. 


4.2. A polynomial p(x) is irreducible in R[x] (“irreducible over R’’) 
if and only if the corresponding primitive polynomial p*(x) (see §2.11) is 
irreducible in G[x]. 

For if the primitive polynomial p* (x) is reducible in G[x], then certainly 
it remains so in R[x]; if it is prime in G[x], then it remains prime in R[x], 
as was shown in §2.14 (ftn. 35). 


4.3. If 3 is an integral domain with unity element (3 may be a field), 
then a polynomial p(x) in 3[x] is divisible by a linear polynomial x — « 
(«€ 3) if and only if « is a zero of p(x), i.e., (p(~) = 9). 

For if p(x) = (x — «) q(x), q(x) € S[x], it follows at once that p(«) = 0. 
Conversely, if p(«) = 0, then by the division algorithm®® we can set up 
the identity 


PXA=%-*YqX)+r awe], res 


and replace the indeterminate x by «a; since p(a) = 0, it follows that 
r = 0, so that (x — «)|p(x), as was to be proved. 


4.4. For a primitive polynomial in G[x]: 
(48) P(X) = Cy + Cx + Cyx*® + +7 + Oyx” 


it is clear that a linear polynomial a) + a,x can be a divisor of p(x) only 
if GCD(a),a@,) = 1, apleg, ale,. In particular, if c, = 1, then 
necessarily a, = +1: a rational zero of an integral polynomial with | as 
highest coefficient is necessarily an integer. 


4.5. The Irreducibility Criterion of Eisenstein 
A polynomial p(x) € G[x] is irreducible in G[x] if there exists a prime 
number aw such that all the coefficients c,; (i = 0,1, ...,2 — 1) with the 


exception of c, are divisible by w and and the first coefficient cy is not 
divisible by n°. 


58 Cf, IB4, §1 (4); the division algorithm (26”) can also be carried out in the non- 
Euclidean ring 3[x] if the “divisor” has a unit for its highest coefficient. 
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For if we examine the factorization p(x) = f(x) g(x) we see that 
w\Cyo, Wetcy, and cy=ayb) imply w|a), wtb,,54 where the 
notation is the same as in §2.12. From the remaining conditions 
w|c,, m|C,, and so forth, it follows that 7|a,, m|a,,..., up to 
7 | Q_,; from 74 c, it follows that 7+ a, , so that a, ~ 0. Thus f(x) has 
the same degree as p(x), so that f(x) is not a proper divisor. 


4.6. The greatest divisor GCD(/(x), g(x)), of two polynomials f(x) 
and g(x) in a polynomial ring R[x]5* can be calculated by the Euclidean 
algorithm (§2.9). But we now give another criterion, usually much easier 
to apply, for deciding whether two polynomials are coprime or not. Here 
it is convenient to write the polynomials in the following way: 


F(x) = ax" + ax" + + am, ay FO, 


(49) 
(x) = Dgx™ + byxm* +o + bn, = bg FO, a FESR. 


The polynomials f(x) and g(x) have a nontrivial common divisor 
GCD({(x), g(x) = d(x) if and only if there exist in R[x] a polynomial h(x) 
of degree <n — 1 and a polynomial k(x) of degree < m — 1 satisfying the 
identity 


(50) A(x) f(x) + k(x) g(x) = 90 (A(X), k(x) F 0). 


For if f(x) and g(x) are coprime, an equation of the form (50) would 
imply (since &[x] is a u.f. ring) that f(x) | k(x) and g(x) | h(x), which is 
impossible since k(x) is of lower degree than f(x), and h(x) is of lower 
degree than g(x). Conversely, if the GCD(/(x), g(x)) = d(x) is a 
polynomial of positive degree, the polynomials h(x) = g(x)/d(x) and 
k(x) = —f(x)/d(x) satisfy all the conditions of the theorem. 


4.7. A Criterion Based on the Resultant 

From the preceding section we can at once deduce that the polynomials 
J(x) and g(x) in &[x] are coprime if and only if the following m+n 
polynomials 


(SI) xP f(x), x F(x), LO), x T(x), x™ 8g (x), ..., 8%) 
are linearly independent*" over 8. 


54 Of course there is no loss of generality in assuming 7 | a) rather than 7 | 6). The 
notation 77 b, means ‘‘z is not a divisor of by .” 

55 In this proof we have used only the u.f. theorem; so the Eisenstein criterion is 
valid in any polynomial ring 3[x] over a u.f. ring 9. 

56 Here R can be an entirely arbitrary field; in particular, one of the polynomial! 
quotient fields R(y), RQ, z),.... 

57 For linear dependence see IB3, §1.3. 
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For equation (50) simply expresses the linear dependence over & of the 
polynomials (51); and conversely, this linear dependence implies the 
existence in R of elements «, and 8; (not all zero) such that 


yx *F(X) HF ayx™ F(X) Fo + Ona f(x) 
+ Box™ tg (x) + Bix™ g(x) + + + Bmag(x) = 0 
is an identity in x, and this identity is of the form (50) with 


h(x) —_ Apx"-1 “LE a x"—3 =e eee + Dl 
Ke(x) = Box? + Byx™-? bo Brno 


If we now consider the polynomials (51) as linear forms in the m + n 
magnitudes x™+"—-1, xm+n—2. | x, 1, the question of their linear dependence 
or independence can be decided (IB3, §3.4) by constructing the determinant 
of their coefficient matrix. For the polynomials (49) this determinant, 
which is called the Sylvester determinant, has the following form: 


Ge OP i ak Oe Ce ee 
Oo Gy ee ete a ae 
a ue n TOWS 
a a Qn 
by by . . ry b,, 
2000. a> we m rows 
by by. ss be 


The first row contains the coefficients ay , a, , ..., 4m of f(x) followed by 
zeros; the second row begins with a zero and is otherwise equal to the first 
row shifted one place to the right, and so on; and the second half of the 
determinant is constructed analogously. It is easy to see that in this way 
we will obtain exactly m + n columns. 

The Sylvester determinant is called the resultant RCf, g) of the polynomials 
(49). The vanishing of the resultant is a necessary and sufficient condition 
for the polynomials f(x) and g(x) to have a nontrivial common factor. 

For in fact the vanishing of R(/, g) is a necessary and sufficient condition 
for the linear dependence of the polynomials (51). 


4.8. The resultant Rf, g) is homogeneous of degree n in the a; and of degree m 
in the b; ; and it is isobaric of weight mn in the two sets of coefficients; its leading 
term is ayb™ (with the coefficient +1). Also 


(53) R(8,f) = (—1)™R(S, 8). 
For it is easy to see,5* by developing the determinant (52) (see IB3, §3.4), that 


58 The indices i, ,..., i, and j, ,...,/m in (54) are not necessarily distinct. 
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the individual terms of the resultant consist of m factors a; and m factors b;: 


(54) a;,a;, °*" a; b;,b;, oy b, ’ 
with rational integers for coefficients. The sum of all the indices in (54) is mn 
(i.e., the resultant is ‘isobaric of weight mn’’): 


(54’) i + i, + a + in + A + Je + re + Jim = mn. 
The proof of (54’) is as follows. In the determinant (52) we make the substitution 
a; — p'a,, b; > pb; ; 


then we multiply the successive rows of (52) by 1, p, p?, ..., p?~1, 1, p, p%, ..., p™7 3, 
and divide the successive columns by the factors 1, p, p,..., p+". We thus 
obtain the original determinant®® multiplied by the factor p"; but in the above 
substitution each individual term (54) is multiplied by the factor 


fyttot +t, tiytigt tin — pmn 
piittate ttytiyti, m= p™, 


which completes the proof of (54’). 

The formula (53) is a simple consequence of a well-known theorem on 
determinants (I1B3, §3.4(a’) and §3.5.2). Note that the determinant on the 
right arises from the one on the left by mn interchanges of rows (each of the 
m lower rows is interchanged with each of the n upper rows), which produce 
the factor (—1)™". 

The leading term ab; , by means of which the resultant may be normalized, 
is the product of the elements in the leading diagonal in (52). 

For example, if m = n = 2, the resultant is given by 


a a a, 9 


Ag A ay 

R = 
(f, 8) b, b, b, 0 
0 by b bs 


= Ay"b,” + @,7bo? — AyQ bbz — A,Aybyby + QoQgb,? + ay2bob, — 2ayazbob, 
= (Adz — Agbp)® — (@od, — a@ybo)(a,b, — ab). 


4.9. The Resultant as a Function of the Zeros or Roots 


We shall denote the zeros® of the polynomial f(x) in (49) by 
Oy 5 Xy, ++, Xm, and the zeros of g(x) by B,,f2,...,8,, and consider 
these zeros as independent transcendents in the sense of IB4, §2.3, which 


59 By the well-known formula for the sum of an arithmetic progression, 
14243 44+ —1) = 4Fn(n—- J, 
we here obtain 
1+2+--+(m+n—1)-1-—2-—+:—(m—1)-—1-—-2-—°++—(—1) =n. 
60 Cf. IB4, §2.2. 
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must be adjoined * to the base field & of the polynomial ring R[x]. Then 
J(x) and g(x) can be factored into linear factors (IB4, §2.2): 


(55’) S(X) = a(x — a%)(X — a) +++ (X — Om), 
(55") g(x) = bo(x — Bi)(x — By) + (x — Br). 


Apart from sign, the quotients a,/ay , b,/by of the polynomial coefficients 
(49) are the elementary symmetric polynomials (IB4, §2.4) of the «, and B; ; 
SO a,"b> "RV, g) is an entire rational function of the elementary symmetric 
polynomials and is therefore a symmetric polynomial in the indeterminates 
a, and in the 8; : 


ay"bo" RF, 8) = Ploy soy Om» Br» ++» Bn): 


Considered as a polynomial in the a;, the expression P has the zeros 
8B; V = 1,..., 2), since substitution of 8, for «a, is necessary and sufficient 
for the polynomials f(x) and g(x) to acquire a common factor x — 8,, 
whereupon the resultant R(/, g) becomes equal to zero. By §4.3, it follows 
that P is divisible by every linear factor «; — 8; , so that we may write 


as"bo"R(f, 2) = CTL T] (a — Bis 


i=1 j=l 


with a factor C still to be determined. From (55”) we have 


ay"bo "RUF, 8) = Cho” [J g(as) = Cho” [] ox” + + + bn), 
i=1 i=] 


so that the /eading term a)"b," in Rf, g) (see §4.8) corresponds to Chy6,,, 
which gives C = 1 and finally: 


(56) RUF, g) = "bo" T] [] (cu — 8). 


i=l j=1 


This relation is an identity in the indeterminates «, and f; if the 
a,/ay , b;/by are replaced by the elementary symmetric polynomials; thus 
the relation continues to hold if the zeros «;, 8;are no longer indeterminates 
but are arbitrary elements of the field & or of an extension field of &. 

From (56) we can at once read off the characteristic property of the 
resultant: namely, the resultant of two polynomials vanishes if and only if 
the two polynomials have a common root. 


6! The original base field ® is thus replaced by the transcendental extension 
Raz, +5 Om 5 Bry oy Bn). 
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The above discussion also leads at once to the two expressions for the 
resultant: 


(56’) R(F,8) = ao" [| g(a) = (—1)"b" | ] £B,). 


tml j=l 


Furthermore if g(x) = g1(x) g2(x) is the product of two polynomials of (positive) 
degree n, , 2,(”, +m, = n), it is easy to show that 


RU 8) = & T] e.(q)e@) = aT] £,(«)| aT] ea), 
f= tml fm 
which gives the important formula 
(56”) Rf, 81 82) = R(4 81) RUS &)- 


4.10. The discriminant D(f) of a polynomial f(x) in R[x] is defined, 
up to a numerical factor, as the resultant of the polynomial f(x) and its 
derivative® f’(x): 


(57) D(f) = 


From (56’) and (58)® we find 


(— Lyiaen) /2 
ao 


RG, f’). 


Dif) = (Hmm ages TT] Fa) = (tyro aot TT TT (ay — 9). 
t= t=l j= 


In the double product on the right every factor (a; — a,) occurs twice, 
the second time with opposite sign; taking these factors together and 
noting the sign, we have 


(59) D(f) = aen-# I (a, ~ a,)*. 


The vanishing of the discriminant D(f) of a polynomial f(x) in R[x] is 
a necessary and sufficient condition for f(x) and f'(x) to have a common 
zero, or in other words for f(x) to have a multiple zero (cf. IB4, §2.2). 


®2 For the derivative of a polynomial see IB4, §2.2. From the formula proved there, 


FO FO FQ 


x — % xX — xX — Ay, 


f@= 


it follows that 


(58) F(a) = ao] ] (ou — 0), 


j=1 


where the prime on the product means that the term with j/ = i is to be omitted. 
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4.11. Elimination Theory 

The problem of solving a system of algebraic equations in several 
unknowns, i.e., of finding their common zeros, is part of the theory of 
elimination. The case of systems of linear equations has already been 
handled in IB3, §§2.4 and 3.6; the solutions there can be found by means 
of determinants. For nonlinear systems of equations we adopt the method 
of elimination; i.e., from the given system of equations we deduce another 
system containing one fewer unknowns (from which one unknown has 
been eliminated), and we repeat this elimination until we reach a single 
equation with one unknown. 

Provided we have taken certain precautions, every solution of the system 
of equations obtained by elimination can be extended, in at least one way, 
to a solution of the original system. Here we take for granted that we 
know how to solve an algebraic equation in one unknown. 

We must be content with illustrating the method for the case of two 
equations® in two unknowns 


(60) I(x, y) = 0, g(x, y) = 0. 


If f(x, y) and g(x, y) are polynomials in the polynomial ring R[x, y], we 
first consider them as polynomials in &*[{x] with R* = R(y). Then the 
polynomials f and g have the form (49), where the coefficients a; , b; are 
now polynomials in y. In order to apply the following theory we must first 
insure that the leading coefficients aj and by) (which may depend on y) 
are elements (+ 0)in the base field ®. This condition can always be satisfied 
by means of a sufficiently general linear transformation. 

Then the necessary and sufficient condition for the polynomials (60) 
to have a common zero is the vanishing of the resultant R(f, g), which is 
here (cf. (52)) a polynomial in y. For the complete solution of the system 
(60) we must determine the zeros 8, , ..., 8; and substitute them into (60); 
the resulting polynomials f(x, 8;) and g(x, 8;) have a GCD of degree > 1, 
which can be calculated by the Euclidean algorithm; let its zeros be denoted 
by a, ..., %,s,- Then the common solutions of (62) are given by 


X = Aes y= 6; J= 1,0, 83 i Dees 


A completely satisfactory theory of elimination can be given only in terms 
of the theory of ideals in polynomial rings P, = R[x, ..., x,]. The left-hand 
sides of a given system of algebraic equations are polynomials p, , ..., p, in P, ; 
they generate an ideal a = (p,,..., p.) in P,, and our task is to determine the 


*3 Strictly speaking, these are not equations but problems, namely, to find all the 
common zeros of the polynomials on the left-hand sides. 

64 These zeros are finite in number, unless R(/, g) vanishes identically, which would 
mean that the polynomials f(x, y) and g(x, y) are not coprime. 
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manifold of the zeros of this ideal. We eliminate x, by forming the elimination 
ideal a4, = aT Py_, in® Py_, = R[x1,..., Xn-,], and then form the second 
elimination ideal ag = aC Pag in Paz = R[X1,..., Xn-2], and so forth. The 
last nonzero elimination ideal is a principal ideal. If it is the unit ideal (the 
entire ring P,), then there are no zeros at all, otherwise its manifold of zeros 
is a sum of algebraic manifolds. 

The investigation becomes even more difficult if the multiplicity of the zeros 
is to be taken into account. The concept of multiplicity plays an important role 
in theorems of enumeration, modeled after the theorem that a polynomial f(x) 
of degree m has exactly m zeros, counting multiplicities. The most important 
theorem of this kind is the theorem of Bézout: 

If multiplicities are taken into account, n homogeneous polynomials in 
RX, X1, +++, Xn] have either infinitely many zeros or a number of zeros equal 
to the product of their degrees. 

In view of the homogeneity here, the trivial zero {0, 0, ..., 0} is excluded and 
two zeros {& , &,,..., &n} and {pé>, pé,, ..., p&,}, with p # O and &é, in K or in 
an algebraic extension of R, are regarded as identical. The multiplicity of 
an individual zero can also be defined, in terms of the theory of ideals, as 
the length (see below) of a corresponding primary ideal, which arises as the 
intersection of primary ideals (§3.6) in the factorization of the ideal generated 
by the » forms. : 

The length / of a primary ideal q is defined as the length of a composition 
series (cf. the corresponding concept for groups in IB2, §12.1), extending 
from the primary ideal q to the associated (§3.6, p. 355) prime ideal p: 


q=qnFCqCCq, =p. 


Here it is assumed that all the terms q;(j = 1, ..., /) are primary ideals associated 
with the same prime ideal p, that q; is a proper subideal (i.e., a proper subset) of 
Qii41, and that the series cannot be made longer by the insertion of further 
terms. In particular, every prime ideal is of length 1. The theorems here are 
similar to those for the composition series of a group: e.g., in a Noetherian 
ring (§3.2) every primary ideal q has at least one composition series of finite 
length /; and every other composition series for the same primary ideal q has 
the same length / (Jordan-H6lder theorem, IB2, §12.1). 

In algebraic geometry still other definitions, some of them quite complicated, 
have been introduced for the multiplicity of points of intersection, but as long 
as we are dealing with applications of the concept as it occurs in the simple 
theorem of Bézout, the various definitions of multiplicity are all equivalent 
to the idealtheoretic one given here. 


65 Thus a, contains all those polynomials in a which are independent of x, . 


CHAPTER 6 


Theory of Numbers 


1. Introduction 


The theory of the natural numbers may be regarded as number theory 
in the narrower sense (see IB1, §1), but no matter how far we may wish to 
set the boundaries, it remains one of the most attractive parts of 
mathematics; for the most part its problems can be understood without 
extensive preparation, and they range from questions that can easily be 
answered to famous unsolved conjectures. 

The modern theory of numbers includes the study of so-called algebraic 
numbers, i.e., the roots (zeros) of polynomials with coefficients that are 
integers in the ordinary sense (rational integers). Under the influence of 
the structure-theoretic methods of present-day mathematics, certain parts 
of number theory have become more abstract. The advantages of such a 
treatment of the subject are particularly clear in the theory of divisibility, 
described in §2. The concepts and theorems of that section are equally 
valid for the ring of Gaussian integers (see IBS, §1.1), i.e., the numbers 
a-+ bi with rational integers a and b, for the polynomial ring in one 
indeterminate (see IB4, §2.1) with coefficients from a field, and for many 
other rings. Thus it is unnecessary to begin the argument afresh for each 
new application. 


2. Divisibility Theory 


2.1. For the time being we consider an arbitrary commutative! ring 
R. Let a and b be two elements of ®; then a is said to be a divisor of b, 
in symbols a| b, if 6 = ac with ce R. To denote the opposite, we write 


1 Throughout Chapter 6, except where otherwise noted, we shall be dealing with 
commutative rings, so that the word ‘‘commutative” will ordinarily be omitted. 
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a+b. This divisibility relation is obviously transitive: from a|b and b|c 
follows a|c. For all x € ® we have x | 0, and thus in particular 0 | 0. If R 
has a unity element, we have the reflexive law: x | x for all x e R, and also 
1 | x. In the relation b = ac the element c is said to be complementary to 
the divisor a. 


2.2. The relation of divisibility defined in this way can be regarded as a 
weakening of the order relation (see IA, §8.3). In an order relation “<”’ it 
follows from a < 6 and 6 <a that a = 6b, but in the ring © of integers? the 
fact that (—2)|(+2) and (+2)|(—2) are both true shows that the divisibility 
relation is certainly not an ordering, not even a partial ordering (see IA, §8.3). 
If 1 € R, such a relation is called a quasi-ordering, i.e., a reflexive and transitive 
relation (to be denoted, say, by ‘““<” for which there may exist unrelated 
elements, i.e., elements a and b such that neither a< 6 nor a = b nor b<a. 


2.3. Now let ® be a ring with unity element. An element « € ® is 
called a unit if there exist an 7 € R, such that ey = 1. Then 7 is an inverse 
of ¢« in sense of IB1, §3.1, so that 7 = e~is also a unit. The unity element 
is obviously a unit, and so is the product of two units. Thus the units 
form a group with respect to multiplication, so that the inverse of a unit 
is uniquely determined (see IB2, §2.3). 

For example, in the ring of Gaussian integers the numbers 1, —1, i and 
—i are the only units, as is easily seen. In a field all the nonzero elements 
are obviously units. 

If c = ab is a factorization of c, then so is c = (ea) : (e~1b), where e€ is 
a unit; thus it is clear that, as far as factorization is concerned, the elements 
a and ae are not essentially distinct. Such elements a and ae are said to be 
associates: a~~ae. The relation of “associate” is obviously reflexive, 
symmetric, and transitive and is thus an equivalence (see IA, §8.5). The 
corresponding equivalence classes are the classes of associated elements. 


2.4. In general, in a quasi-ordered set two elements a and 6b are said to be 
associates if a < 6 and also 6 <a. It is easy to show that the quasi-ordering 
induces (see IA, §8.3) a partial ordering in the set of corresponding equivalence 
classes. 


2.5. Now let a, b and t be elements of a ring ® (with unity element) 
such that ¢| a and t| 5. Then it is clear that ¢ is also a divisor of any 
linear combination xa + yb({x, y} OR). The set B(a,b) = B of all 
linear combinations of a and b has certain remarkable properties: 


I From {v, , vo} € B it follows that v, + v,€ B. 
II IfveBandre, thenrve &. 


2 The symbol € will be used throughout Chapter 6 to denote the ring of rational 
integers. 
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The first property states that 8 is a module (i.e., an Abelian group 
with additively written operation; see also IB2, §1.1); more precisely: B 
is a submodule of the additive group R+ of R.3 If we take r € B in (II), we 
see at once that % is a subring of ; but the condition (II) is much 
stronger, since we may take re ®. By IBS, §3.1 the conditions (I) and (II) 
are precisely the definition of an ideal in R, so that in the notation of 
IB5, §3.2 we see that B is the ideal (a, bd). 


If R is a not necessarily commutative ring, we have /eft ideals and right ideals, 
and also two-sided ideals, according to whether in 1] we may ‘set r on the left, 
on the right, or on both sides.‘ 

From a ~ b it obviously follows that the principal ideals (a) and (6) coincide, 
and from a | b follows (6) © (a) and conversely. 


2.6. Any element of a ring is obviously divisible not only by its 
associates but also by every unit in the ring. The associates and the units 
are said to be trivial divisors of a. A divisor d of a which is not an associate 
of a is said to be a proper divisor of a, which we shall occasionally denote 
by d| pra. An element a € § is said to be reducible if there exists at least 
one factorization a = ad, °*: a, in which all the a; are proper divisors 
of a; otherwise a is irreducible. \f a is irreducible and a = ab, then either 
6 =1 or 1 — }b is a nontrivial divisor of zero (see 1B5, §1.7), since 
a(1 — b) = 0. In an integral domain (i.e., a commutative ring without 
divisors of zero; see IBS, §1.9) the nonzero irreducible elements are identical 
with the elements that have only trivial divisors of zero. 

In the ring € the positive irreducible numbers are called prime numbers.*® 

We now say that a ring ® with unity element admits a theory of 
divisibility if it satisfies the following condition: 


8 With respect to addition alone, every ring R is obviously an Abelian group, the 
so-called additive group of the ring, in symbols R+. With respect to multiplication the 
set  — {0} is a semigroup, the so-called multiplicative semigroup of R, in symbols R*. 
If is a field, or only a skew field, then R* is obviously a group, the multiplicative group 
of the ‘‘skew” field. In this case the multiplicative group is obviously identical with the 
group of units. lt was merely to preserve this group property for the special case of 
a field that we excluded the zero element from ‘RX. 

‘If R has no unity element, the conditions (I) and (IJ) still define an ideal. By (I) the 
left ideal (a, , a2, ..., @,) Consists of all expressions of the form 


imma, ma “Poo © 
{x1 9 Xe goers Xn} € R, 

where for positive integer g the product ga is defined by 1?_, a, and (—g)a = —(ga). 
In a ring with unity element we have g - a = g - ea = (ge): a(g € ©), so that in view 
of ge ¢ R we may omit the expression g,a, + *** + Spay. 

5 This definition is not universally accepted. For many authors the zero element and 
the unity element are not prime numbers. By the above definition the zero element and 
all the units in an integral domain are irreducible. 
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Fundamental condition of the theory of divisibility: let % be a system of 
representatives of the classes of associated irreducible elements, excluding 
the class of units. Then for every a ~ 0 there exists a representation of the 
form 


s 
a=e|[]| py s > 0,8 a, >0 (ea unit), 
A=1 


(1) 


PeF Pr for KA~X, PCY (A = Lesaug 5), 


which is unique apart from the order of the factors, i.e., for two representa- 
tions of the form (1) 


t 
e, [] pa = & [] a 


A=1 p=1 


it follows that s = t > 0 and with a suitable arrangement of the factors, 
Pr = 4,5 % = B, for all X = 1,...,5 and ey = €. 

In rings satisfying this condition the irreducible elements are also called 
prime elements (for the general definition of a prime element in rings see 
§8.2 and IBS, §2.3), the rings themselves are called u.f. rings (unique 
factorization rings), the above fundamental condition is called the u.f. 
condition and the factorization (1) is said to be canonical. 

If ® has nontrivial divisors of zero, so that ab = 0,a 40, b 40, and 
if fora = a+ ab weconstruct the factorization (1), then fora = a(1 + 6) 
we obtain a second factorization (since b0) by factoring 1+ 6 
canonically: for if 1 + 5 is not a unit, then the exponents are not identical, 
but if 1 + 6 = € isa unit, it follows from the assumed u.f. condition that 
€, = €,€, and thus, since the units form a group, we have at once e = 1, 
or in other words 6 = 0, in contradiction to our assumption concerning b. 
From this contradiction of the theorem of unique factorization we have: 


Every u.f. ring is an integral domain. 


The above definition of a u.f. ring is identical with the definition in IBS, §2.6, 
as follows at once from the discussion of prime elements in §8.2. 


It is convenient to introduce the following definitions: a prime element 
p is called a prime divisor of a if p|a, p* 1 and p+ 0. If a and b are 
arbitrary elements of the ring and d| a, d| b, so that dis a common divisor, 
then the greatest common divisor (GCD), which we denote by (a, 5), is 
defined, provided it exists, as a common divisor g such that d| g holds for 
all common divisors d. Similarly, a v with a|v and b|v is a common 


* Here we are adopting the convention that an empty product (e.g., IT)_, 4) always 
has the value 1. Similarly, an empty sum has the value 0. 
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multiple of aand b, andak witha |k, b | k,k | v for all common multiples 
v is the least common multiple (LCM), in symbols [a, 5}. For the general 
case n > 1, the GCD (a,, ae,..., d,) and the LCM [a,, a,,..., a,] are 
defined correspondingly. If all a, 40 and if we set b, = aya, °*: a,/a, 
(v = 1,2,..., ”), it is easy to show that 


(a 2 a5 v5 Gn) : [b, ’ b, grees by] = QQ, '"' An. 


In the special case n = 2 this relation becomes the simpler (a, b) - [a, b} = ab. 

For a ~ 0 and b + O, if in (1) we allow zero as an exponent, we may 
write a = €, [[$_, pv and b = e, []_, p®; so that in u.f. rings we have 
the formulas 


(2) (a, b) = [J] pmineaéa) and [a, b} = [J] pme*taaAa, 
A=1 A=1 

Thus the (a, b) and [a, b} necessarily exist, but as long as we make no 
convention about normalization, they are determined only up to associates, 
so that it would be more correct to regard the symbols as denoting the 
corresponding equivalence classes in the sense of §2.3. 

If (a, 6) = 1, we say that a and b are coprime. 

Finally, from the u.f. condition we obtain the so-called fundamental 
lemma of the theory of divisibility- 

If p is irreducible, it follows from p | ab that p| aor p |b. 

As examples of u.f. rings that do not fall under the special headings of 
§2.9 and §2.10 let us mention the polynomial rings R[x, , xo, ..., Xn] 
(see IB4, §2.3) in m indeterminates, where itself is assumed to be a uLf. 
ring (see also §2.10, next-to-last paragraph). 

The prime elements in polynomial rings are called irreducible 
polynomials, and the other polynomials are said to be reducible. 

Concerning the number of prime elements in a ring we have the 
following generalization of the classical theorem on the infinitude 
of primes. 


Theorem of Euclid: If ® is au.f. ring, which is not a field and which has 
the property that for every nonunit a ~ 0 there exists a unit € such that 
a+ e¢#1, there exist infinitely many prime elements, no two of which are 
associates. (See also the last paragraph of §2.10.) 


Proof: No prime divisor of a + € is an associate of a prime divisor of a. 
Since ® is not a field, there exist non-unit elements a, and by hypothesis 
there also exist prime divisors of a + €; consequently, in the canonical 
factorization of a there cannot appear a complete system of representatives 
of the classes of associated prime elements. 

For u.f. rings with only finitely many nonassociated prime elements see 
the end of §2.10. 
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2.7. On the basis of §2.2 and §2.4 we now regard the classes of associated 
elements in a u.f. ring as a partially ordered set. If we let @ and & denote the 


oN 
equivalence classes defined by a and b, then the GCD class (a, 5) is the greatest 
common predecessor of the classes 4 and 6 with the property that every common 


predecessor d of 4 and & is also a predecessor of (a, b), from which it follows 
that there exists exactly one greatest common predecessor. Similarly, the 


LCM class fa, 8} is the unique least common successor of d and 6 in the sense 
that is precedes every common successor. A partially ordered set of this sort, 
in which every two elements have a greatest common predecessor and a least 
common successor in the above sense, is called a J/attice (see IB9, §1). Thus 
the classes of associated elements in a u.f. ring form a lattice. 


If the construction of (a,b) and [a, 6] is regarded as two operations, in 


“oN. 
symbols (a, b) = 4U 5, fa, b] = 464, it is easy to verify the associative law; 
and the commutative law is trivial. 


2.8. Since every u.f. ring is an integral domain, it can be embedded, 
by IBI, §3.2, in a quotient field Q(R). If we write the elements « ¢ Q(R) 
in the form « = a/b with {a, b} € R and apply (1) to a and b, we obtain, 
for all « 0, a representation of the form (1) (allowing negative exponents) 
with the corresponding uniqueness properties. We thus arrive at a theory 
of divisibility for 2 (R) relative to R by defining «, | K, with {i , Ko} C Q(R), 
to mean that «,/«, € R. Then by (2) we can define the GCD and LCM for 
all «xe Q(R). If @, 5) = 1, the fraction a/b for « is said to be in lowest 
terms, and it follows from the u.f. condition applied to Q() that this 
representation is unique up to associates. The significance of cancellation 
in fractions becomes clear from this discussion. 

By the lowest common denominator of a,/b, , Qeq/be, ..., An/b_, We mean 
the LCM [b,, be, ..., dy]. 


2.9. Principal Ideal Rings 


It is natural now to ask for sufficient conditions that a given ring should 
be a u.f. ring. Here we may refer to the result already proved in IBS, §3.4 
on principal ideal rings: 


Theorem 1: Every principal ideal ring is a u.f. ring. 


The part of the proof that there exists at least one factorization into 
prime elements depended on the so-called divisor-chain condition (IB5, 
§2.5). For later use let us state an equivalent condition in ideal-theoretic 
terms. Let az | a, , a3 | de, «+5 Any, | Gn... be a divisor chain. By §2.5 
the condition a,,, | a, is equivalent to (a,,,) 2 (a,). Now, for an arbitrary 
ring we say that the maximal condition is satisfied if in every ascending 
sequence of ideals: a, C az € +++ Cay, C any, C..., from some place onward 
all the ideals are equal to one another. The theorem proved in IB5, §3.4 
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to the effect that the divisor-chain condition holds for principal ideal rings 
can thus be formulated as follows: 


Theorem 2: Zhe maximal condition is satisfied in every principal ideal 
ring. 


In §8.2 we shall return to rings with the maximal condition that are not 
necessarily principal ideal rings. 


In theorem 1 we now have an important sufficient condition for a ring 
to be a uf. ring. Our next aim is to find sufficient conditions for a ring 
to be a principal ideal ring. 


2.10. Euclidean Rings 


As models for the following concepts we may consider the ring © of 
rational integers and the polynomial ring R[x} in one indeterminate 
(where § is a field) (see IB4, §2.1),’ since in each of these two rings there 
exists a division algorithm (see below). 


Definition: A ring © without divisors of zero is said to be Euclidean if 
the following conditions are satisfied: 


(I) In € — {0} there is defined a nonnegative integer-valued function w(x), 
the so-called (absolute) value function, or valuation. 


(II) For every pair of elements a and b in © with b 4 0 there exist elements 
q and r in © such that a = qb+r and w(r) < w(d) or r = 0 (division 
algorithm). 


Lemma: Ina Euclidean ring € every ideal a is a principal ideal (a) in 
the sense that all x €a are multiples ga of a.® 


Proof: Let a be an element with the smallest possible value w(a) in a; 
by (I) there exists at least one such element, if © + {0}. Then for arbitrary 
b €a there exists by (II) a representation in the form b = ga+r(r=0 
or w(r) < w(a)). By the module property of an ideal it follows that 
r = b — qaéa. Thus the minimal property of a implies r = 0. 

If we apply the lemma to the unit ideal (IB5, §3.1) ©, it follows that 
© = (€), so that for every x € © there exists a g(=q(x)), with x = eq; in 
particular, for x = ¢€ let « = ee. Then 


(3) x = ge = qee = ge'e = xe forall xe; 
that is, e is the unity element in ©. To sum up, we have 


” Since in general the degree (a, + a,x + +: + a,x" = n if a, # 0, it is customary 
not to assign any degree, or possibly the degree — oo to the “0” (the zero polynomial) 
(in TB4, §2.1 and IBS, §2.11, on the other hand, we set degree 0 = 0). From the definition 
of a unit it also follows that all ae R, a 4 0, and only these are units of R[x]. 

8 Note that the existence of a unity element in € is not postulated. Cf. §2.5, footnote 4. 
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Theorem 3: Every Euclidean ring is a principal ideal ring and thus a 
u. f. ring. 


In order to show that the ring © of of integers is Euclidean, we 
set w(x) = |x|, so that (I) is satisfied. As for (II), let us first assume 
0<a<[b|. Then a =0:6+4a is a division formula (II). Now let 
a >|6| > 0; then a and | }| are natural numbers with the Archimedean 
property (see IB1, §3.4) that there exists a natural number n such that 
n|b| >a. But the set of natural numbers is well-ordered by the ‘“<” 
relation (see IBI, §1.4), so that the subset of all n with n | b | > a contains 
a smallest number, say g, +1; then (¢,+ 1):|b|>a>q,:|b|. 
Setting q = g, ‘sgn b and r = a — q,b and subtracting q, | b, we obtain 


(4) 0<a—q|b|=r=|r|<[6b|, dh a=(q:‘sgnbb+r 
=qb+r, 


so that (II) is again satisfied. Finally, ifa < 0, then by (4)(—a) = q,|6| +1, 
0<r<|b|, so that a = (—q, :sgnb) b — r, and 


(5) a= qb+ (—?r), [—r|<|)| (q = —q ‘sgn db), 


and therefore II holds in every case. In (4) the remainder is nonnegative, so 
that (4) represents a division with smallest positive remainder. We can also 
bring (5) into the same form: a = —q,|b|—r=—(q,+1)|6| + 
|b|—r, whereO <|b|—r<|6|forr+0O. 

On the other hand, we could have put (4) in a form with a negative 
remainder, and then by choosing the remainder, positive or negative, with 
smaller absolute value we obtain the division with smallest absolute 
remainder. Except when “2 | b and | r| = | 6/2 |”, where the two possibili- 
ties provide the same absolute value for the remainder, it is easy to show 
that the g and r are uniquely determined in every case. 

In the polynomial ring R[x] in one indeterminate over a field S it is 
obvious that w(/(x)) = degree f(x) (/(x) € R[x], f(x) 4 0) is a valuation 
satisfying (1). But the division algorithm (II) also holds, so that R[x] is a 
Euclidean ring. In order to prove (II), we must first show that for two 
polynomials f(x) = ayx" + +++ + a, and g(x) = byx* + +++ + b,, by 40 
with degree f(x) > degree g(x) there exists a g(x)¢€S[x], such that 
degree (f(x) — q(x) g(x)) < degree f(x); but it is at once clear that 
q(x) = (ao/by) x"-* is satisfactory for the purpose. Now let f(x) and g(x) 
be arbitrary with g(x) 4 0; then in the case degree f(x) < degree g(x) 
we can at once satisfy (II) with g(x) = 0, r(x) = f(x), and if degree 
(x) > degree g(x), then let g(x) be so chosen that degree (/(x) — q(x)g(x)) 
is minimal, provided we do not already have the trivial case g(x) | f(x). 
If we set r(x) = f(x) — q(x) g(x) and assume that degree r(x) > degree 
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g(x), we have already shown that there exists a q,(x) € R[x], such that 
degree (r(x) — q(x) g(x)) < degree r(x). But then 


degree r(x) > degree (r(x) — q,(x) g(x)) 
= degree (f(x) — (g(x) + (x) g(x), 


in contradiction with our having chosen q(x) so as to minimize degree r(x). 
Thus for f(x) = q(x) g(x) + r(x) we have degree r(x) < degree g(x). 

The ring &[x] has the further property that in the division formula 
S(x) = a(x) g(x) + r(x) in (ID the polynomials q(x) and r(x) are uniquely 
determined, as can easily be shown by an indirect proof based on their 
degrees. 

The ring ©{i] of Gaussian integers (IBS, §1.1) is also Euclidean. For 
w(a + Bi) = (a + Bi)(a — Bi) = a + B? obviously satisfies (I) and if the 
norm (see §8.1 and IB8, §1.2) N(z) = zz of a complex number z = a + fi 
is chosen as its absolute value (so that the distributivity w(z,z,) = 
w(Z,) ‘ w(Ze) ([B8, (10)) is immediately clear), then (II) is proved as follows. 
In order to find, for given z, and z, + 0, the gq and r demanded by (II), we 
first determine a (perhaps fractional) complex number gq’, such that 
Z, — Zoq’ = 0, and then in q’ = y’ + 8/i we replace the rational numbers 
y’ and 8’ by the nearest integers, say y and 5. With q = y + di and 
Z, = qZ_, + r we then have 


N(r) = N(Z% — QZ) = N(@ — 9'22 + (q' — G) 22) = N(q' — 4) 22) 
= N(q' — 4) N(22), 
Nq'—g=(y' — y+ — 6? < 4)? + G)? <1, 


so that M(r) = N(q’ — q) N(z,) < N(zz), which satisfies (II). 
In the same way it can be shown that the set of numbers « + B/2, 
{x, B} C ©, is a Euclidean ring if we put 


w(a + BY2) = (a + BV2)(a — BV2)| = | « — 26? |. 


In general, the set of numbers « -+ BV 8, {x, B} C ©, where §€ € is not 
a perfect square, forms a ring, as is easily proved; but in general this ring 
is not Euclidean, as may be shown by the examples 6 = —5, —3, +10, 
and so forth. For 6 = —5, for example, 


21 = (44+ V—5\(4 — V—5) =3°7 
shows two essentially different factorizations into irreducible factors.® 


® For details see, e.g., Hasse [3], §16. 
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A subring of a Euclidean ring is not necessarily even a u.f. ring, as may 
be seen from the ring of even numbers; it is clear that all numbers of the 
form 2u with odd u, and only these, are irreducible, and for 60 we have the 
two factorizations, 2 - 30 and 6: 10. 

The valuation w(x) is not uniquely determined; in fact, for every fixed 
integer A > 0 the function Aw(x) is easily seen to be another valuation. 
In general, it is possible to construct valuations that are not connected with 
one another in such a simple way, and for some of them it is necessary, 
in order to preserve Axiom II, to use other magnitudes in place of g and r 
in the division formula a = qb + r. Thus it is natural to ask how we can 
normalize the valuations so as to restrict them to convenient forms. In 
this direction we have?® the following theorem. 


Theorem 4: For a Euclidean ring € there exist valuations w(x) such that 
associate elements have the same values; and this property is equivalent to 
the property that w(x, y) > w(x) for x 40, y 40. 


Proof: Let x*(x) be any valuation for which € is Euclidean. Let 4 
denote the class of all associates of a in ©. Set w(4@) = w(a) = min,.g w* (x). 
Since w*(x) is an integer, there exists an a,, € Gd such that w(a) = w*(a,,), 
and therefore 


(6) = W@) = W@m) = Wn) <w*(a) forall a~an. 


Here Axiom I and the additional condition are obviously satisfied. Now 
let a 0 and let b be arbitrary, define a,, = ea (where e« is a unit) as 
before and let b = ga, + r be a division formula with respect to w*(x). 
If r= 0, then Axiom II with respect to a and b is satisfied 
for all valuations, since b = qa, = qe:a. For r+ 0 it follows from 
(6) that w(b — gam) < w*(b — gam) = w*(r) < w*(am) = w(a); thus 
w(a) > w(b — gam) = w(b — qeea,) = w(b — qya)(q, = ge), so that 

= qa-+pr, and Axiom II is again satisfied. Only the last part of the 
theorem now remains to be proved. Let us first assume that w(x, y) > w(x). 
If € is a unit, we have 


w(ae) > w(a) = w(ae: €"') > w(a@e), and therefore w(ae) = w(a). 


For the proof of the converse it is sufficient to show that w(ab) > w(a) 
for nonunits. Let us assume to the contrary that w(ab) < w(a). In the 
division formula a = q:ab-+r we then have r-=4 0, since b “1, and 
therefore w(a) > w(ab) > w(r) = w((1 — qb) a); but then the strict 
inequality shows that 1 — qb = b, is not a unit. The same procedure 


10H. J. Claus, Uber die Partialbruchzerlegung in nicht notwendig kommutativen 
Ringen. Journ. f. reine u. angew. Math. (Crelle) 194, (1955), 88-100. 
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applied to ab, and a in place of ab and a leads to a nonunit element 8, , 
and thus to the inequality w(a) > w(ab) > w(ab,) > w(ab,). Continuation 
of the procedure produces a nonterminating, strictly monotone decreasing 
series of values w(ab,), in contradiction to Axiom I. 

On the basis of this theorem we may now adjoin to the Axioms I and II 
the following axiom: 


(Ill) From a~b it follows that w(a) = w(b), from which we may also 
assume w(ab) > w(a) for all a0, 6+ 0, whereby we have reached 
agreement with IBS, §2.8. 


We can state the further result: if a is a proper factor of b, i.e., a \prb, 
then w(a) < w(b), and if w(a) = w(1), then a~ ! and conversely. 


Proof: Since b+ a, we haver + Oinevery division formula a = gb +r. 
If we set b = ac, then 


w(b) > w(r) = wa — gb) = wall — qe) > wa), 


as desired. The second statement follows immediately. 

The above development of the theory of divisibility is based on the 
theory of principal ideal rings and may thus be regarded as an ideal- 
theoretic. method. If we begin with a Euclidean ring € in the first place, 
we can reach the same results by a different method, which is more 
elementary and has the advantage of being constructive, namely, by 
explicitly calculating the GCD rather than by proving its existence from 
the properties of a principal ideal. For this calculation we use the Euclidean 
algorithm in the following way. Let a and b, b 4 0, be arbitrary elements 
of €; then Axiom II allows us to set up in succession the division formulas: 


a=qb+n, w(r,) < w(d), 
b=Qi+re, W(re) < wr), 
(7) n= Gla + hs, w(rs) < w(re), 
Vn—2 = Unlhn-a + Uns w(r,,) < Wns), 
Pr~1 = Qnsiln ai Tn4i > W(Fn41) < W(Tn) OF Pry = 0. 


The sequence is to be regarded as terminating as soon as a zero 
remainder occurs. Since w(b) > w(r,) > w(r2) > °°, it follows from 
Axiom I that such a remainder must eventually occur. If we run through 
the algorithm (7) from the first line down to the last, we see that every 
common divisor of a and b is a divisor of all the r, 1 <vy <n+ J), 
and on the other hand, if we run through (7) from the last line up to the 
first, assuming r,,, = 0, we have ry|fn1, nl ln-2.+> Mal, tn| a: 
Taken together, these results show that the last nonzero remainder r, 
is the common divisor of greatest absolute value, a property which, on the 
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basis of Axiom III, can be used as a definition of the GCD, for which 
we now have the following theorem. 


GCD Theorem: Every common divisor is a divisor of the GCD. 


It is easy to prove that (a, , dy, ..-, Qn) = ((@y » Ag, «+-5 An—1), An) and also 
that the GCD theorem holds for arbitrary n > 1 (cf. the last paragraph 
of §2.7). 

Finally, if we begin at the next-to-last equation in (7) and work 
backwards, we obtain a representation of the form (a, b) = r, = axy + byg. 
Then the fundamental! lemma, and with it the u.f. theorem, can be proved 
in exactly the same way as for principal ideal rings. 

In analogy with the GCD, we can now define an LCM [a, , ap, «.., An] 
as acommon nonzero multiple of least absolute value, for which we obtain 
the following theorem: . 


Theorem of the LCM: For every common multiple v, we have 
v = [A,, Ag, «5 An] | v, . In particular, all the LCM’s are associates. 


Proof: We apply the Euclidean algorithm to v, v, and obtain the 
GCD (uv, v,) = d. From the minimal property of w(v) and from d | v it 
follows that w(d) = w(v), so that dtprv; that is, d ~~ v, and thus, in view 
of d| v, , we have at once v | v, . 

Finally, we must mention a third way of constructing the theory of 
divisibility in Euclidean rings, namely by first proving the u.f. theorem, 
i.e., without using the concept of the GCD, and then defining the GCD and 
LCM by (2). But in order to obtain the important representation of the 
GCD (a, d,,..-, 4) as a sum of multiples a,x, + agxX_ + *** + QyXn , 
we must then proceed either by way of the principal-ideal-property (if we 
are satisfied with proving the existence of the desired representation) or 
else by way of the Euclidean algorithm. 

In order to prove the u.f. theorem directly (i.e, without using the GCD) 
we require a sharpening of the above Axiom III,1! which will also be 
necessary for the discussion of partial fractions in 2.1] below. In place of 
Axiom III we now require the following axiom: 


(III’) From w(a) < w(b) it follows that w(ac) < w(bc) for all c 4 0, and 
conversely. 


Corollary 1: from w(a) = w(b) it follows that w(ac) = w(bc) for all 
c ~ 0, and conversely. 


11 No immediate proof of the u.f. theorem is known at present without this sharpening 
of Axiom III. It is possible that all Euclidean rings are no longer included; however, up 
to the present no known Euclidean rings fail to satisfy the new requirement. Thus it 
would be of interest to know whether a theorem analogous to Theorem 4 is valid. 
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Corollary 2: from IIT’ follows III. 
Corollary 3: from w(a) < w(b) and w(c) < w(@) follows w(ac) < w(bd). 


Proof: Corollary 1 is easy to prove indirectly. As for corollary 2, it is 
sufficient by theorem 4 to prove that w(a@b) > w(a). But if we had 
w(ab) < w(a) = w(a- 1), it would follow that w(1) > w(S) and thus 
w(b) = w(1 - 6) > w(6?), and then w(b?) > w(69), and so forth; but the 
chain w(1) > w() > w(b?) > ... would be in contradiction to I. Corollary 
3 follows from w(ac) < w(bc) < w(bd). 


Proof of the u.f. theorem in Euclidean rings under the assumptions I, II 
and \i1.12 We make the induction hypothesis that the theorem is true for 
all x with w(x) < w(a) and assume the existence of an a contradicting 
the assertion. Now let p ~ 1 be a divisor of a with the smallest possible 
value, from which it follows that p is irreducible. Let a = bp. Since p ~ 1, 
we have 6 |pra, and thus w(b) < w(a), so that 6 has a canonical factori- 
zation, and a has at least one factorization (1). Let g ~ 1 be an irreducible 
factor of the second (assumed) factorization (1) of a, with a = qc. The 
two factorizations cannot have associated irreducible factors, since by 
cancellation of such factors we would obtain an element of smaller value 
than w(a), which would therefore, by the induction hypothesis, have a 
unique factorization (1). Consequently, the original factorizations of a 
cannot, after all, be different from each other. In a = pb = qc we now 
insert the division formulas q=q,p+rnf and c=qp+r,, with 
r, ~0Oand r, #0, since p “ q and pt c. We thus obtain: 


(8) a= pb = (4p + M)(GeP + re) = P(9192P + 1192 + 1291) + Ne. 


From the minima! property of p it follows for the division remainders that 


w(r1) < w(p) < wQ@), 
w(rs) < w(p) < wc), 


and thus by Corollary 3: 
w(nir2) < w(gc) = wa), 


so that r,r, has a unique factorization. Since w(r,;) < w(p) (i = 1, 2), the 
factor p cannot occur in this factorization, but by (8) we have p| rr, 
which provides the desired contradiction." 


12 See H. Klappauf, Beweis des Fundamentalsatzes der Zahlentheorie. Jahres- 
bericht DMV 45, 130 Kursiv. The first proof was given by Zermelo. 

13 In this form of proof by induction it is unnecessary to prove the initial statement 
(although its correctness for all a ~ 1, i.e., w(a) = w(t), is obvious, since units are 
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If ® is merely a u-f. ring, then on the basis of the fact that Q(R)[x] is 
Euclidean and is thus a u-f. ring, it can be shown (see IBS, §2.14) that 
R[x,] = RK, is also a u-f. ring (Gauss). By successive application of this 
theorem, the same result follows for R,[{x,.] = R[x, , x2}, R[x, , Xe, Xs], «.-, 
as was mentioned at the end of §2.6 (see also IBS, §2.14). 

If now in the field P of rational numbers we consider the set Mt of all 
fractions s/t, {s, t} C ©, 3 ¢ ¢, it is easy to show that Mtis a ring, a so-called 
quotient ring. All s/t with (s, 3) = (t, 3) = 1 form the group of units. 
Apart from associates, the only prime element is 3, and it is clear that every 
number in Mt can be represented uniquely in the form ¢€3* (where « is a 
unit) and a > 0 is an integer. With the definition w(e3*) = a, it is 
easy to show that Mt is Euclidean, and to generalize to the case of more 
than one prime element. On the other hand, it is clear that the assumptions 
for the Euclidean theorem (see §2.6) hold for the rings €, €[/], R[x] (where 
K is a field). 


2.11. Decomposition into Partial Fractions in Euclidean Rings 


We now assume that € satisfies the Axioms I, II and III’. Let Q = Q(€) 
be the quotient field of ©. Those elements u € Q that are also in € are called 
integers. Also, u = a/b ({a, b} C ©) is said to be a proper fraction if 
a = Oor w(@@) < w(0). It is a consequence of III’ that the property of being 
a “proper fraction” is invariant'4 under cancellation or under multiplication 
of numerator and denominator by the same number. 

If a and b are integers with (a,b) = 1, then a/b is called a partial 
fraction if and only if b = 1 or w(a) < w(b). 

If s and ¢ are integers with (s, t) = 1, and if t = []}_» ¢,, where the q, are 
coprime integers, and q, = 1, then 


(9) -=) oa (a, integers) 
Wp 
is called a decomposition into partial fractions (abbreviated DPF) of the 


always irreducible): for if the correctness of the statement A(x) for all x < n implies 
the correctness of A(n) (n > 1), then A(n) holds for all natural numbers n, since the 
correctness of A(1) is now included in the proof: i.e., the induction hypothesis is true 
because it consists of the (empty) statement “‘A(x) holds for alla < 1,” which is certainly 
not false (cf. IB1, §1.4). But if for the argument by induction it is necessary that there 
should exist an x) < n for which A(x,) is correct, then the above form of proof by 
induction cannot be applied. 

“% Under Axioms I, II, III alone it is easy to construct valuations in © for which this 
invariance no longer holds (see Ostmann, Euklidische Ringe mit eindeutiger Partial- 
bruchzerlegung, Journ. f. reine u. angew. Math. (Crelle) 188 (1950) 150-161. 
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first form, if all the a,/q, are partial fractions. If t = []{_; p}e is a canonical 
factorization, then 


(10) -=a,+ y ee 

p=1 x=1 Pp* 
with all the a,, integers, W(a.) <W(p,) OF A = 0, is called a DPF of the 
second form. 


Theorem: Every x € Q(€) has DPF’s of both forms. 


Proof: As for (9), it is sufficient to consider the case r = 2. 
Since (9,,92) = 1, there exist in © two elements x, and y, such 
that 1 = 4X) + doy, ; thus 


S _ $41Xo + S42 Yo _ SX fe SYo 
t 9192 do 1 


But now from the division formulas syy = ajq, + 4, SXp = ang? + a, 
with a, + aj = a), we deduce (9) as follows: from (s, t) = 1 we have 
(SX > 92) = (S¥o, 41) = 1, since otherwise q,q. = t would not be the least 
common denominator; thus we must also have (a; , g;) = 1 @ = 1, 2). To 
obtain (10) we start from (9) with g, = piv. If in a,/p)e we insert the 


division formula a, = b,p, + aa, , We have 


ap b, ‘. Apa 
7 er | r 
pio pre pp 


where Gr, /pje is already a partial fraction. By successive application of 
this procedure to the first summands on the right, we finally obtain (10). 
For example, 1/12 = 1/3 - 4 has four DPF’s of the first form: 


we=}—}=-1+}+28=1-4-4=-94+4; 


on the other hand, it is easy to show that for © = R[x] both these DPF’s 
are unique in Q = R(x), if we take w(/(x)) = degree f(x). 


The DPF’s in R(x) have a well-known application to the integration of 
rational functions. As a geometric application of the DPF’s in the field 
Q(C) = P of the rational numbers, let us mention the construction of a regular 
n-polygon with composite n of the form m = 2%qig2 -** gz, « => 0, where the q; 
are odd and pairwise coprime, and the regular qg;-polygon is assumed to be 


15 On the existence of a division algorithm in R[x] with respect to other valuations 
and for DPF’s in general, see the references in footnotes 10 and 14. 
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constructible with rules and compass (e.g., the 15-polygon). In order to construct 
the angle 27/n = 2n[1/(2%q, --- q.)] we construct the DPF of the first form 1/n: 
277, >, 27a, ~ 27a, 

2H 2 n qs 

Since the a, (c = 0,1, ..., s) are integers, the angles 27/qo | a | (c = 0, 1,..., 5; 
qo = 2%) are constructible by hypothesis, and thus we can also construct the 
angle 27/n (see also the last paragraph of §2.12.). 


(11) oa 2 (> 


fecenit ra le 
11 Qs 


2.12. Number of Divisors, Sum of Divisors; 
Certain Special Types of Numbers 


Let ® be a uf. ring and let % be a system of representatives of the 
classes of associate prime elements, excluding the class of units. If ne ® 
has the canonical factorization n = e[[?_, p?‘, p;€ P, then the divisors 
of n obviously have the form  []j_, p#', 0 < B; < a;, where 7 is a unit. 
By a normalized factor d of n with respect to % we mean ad = [][‘_, p*, 
0 < B; < «;, and the symbol }\4,, means that d runs only through the 
normalized factors. In the ring ©, unless otherwise noted, the set of prime 
numbers >1 (see §2.6) will always be taken for 8, so that in this case 
d runs through all the positive divisors of n. 

We now define the function 
(12) a(n) = > d* — (k real). 

d\n 
For k = 0 we obtain the number of divisors o,(n) = r(n), and for k = 1 
the sum of the divisors o,(n) = a(n). The values of these functions are 
given by 


8 Klagtl) __ 1 


(13) 7(n) = Il (a; + 1), on) = [] —{—— ee = 4 


i=] i=1 


(k ~ 0). 


Proof: Since there are exactly a;-+ 1 ets for the 8; in 
d = []ji.1 pi the stated value of 7(”) follows at once by complete induction. 
Furthermore, 


y d* — y py? pyPe vee pres — y y wee pyr eee pres 
d\n B,=0,...,04 By=0 6,=0 B,=0 
pr=0 patee X, 
=|] Xp = iG pe ae Cpe’ stb Cp Pe) 
i=1 B,=0 i=1 


> 


at =] eo“ 


as the sum of a geometric progression if k + 0. 
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By (13) we have at once: 
(14) From (u,, Ue) = 1 follows o;,(uytte) = o;,(uy) - o,(Ue) (Kk real). 


In general, a function f(x) is called multiplicative if (x, y) = 1 implies 
S(xy) = f(x) f(y), and distributive if this functional equation holds for 
arbitrary x, y. Thus the function o,,(m) is multiplicative. 

Also, a function F(x) is called the summatory function of f(x) if 
F(x) = Daf). 

It is easy to see that if f(x) is multiplicative, then so is F(x). 

In © a positive number is said to be «-perfect if a(n) = xn, x-deficient 
if o(n) < «n, and x-abundant if o(n) > «xn; and for « = 2 the given 
number is simply called perfect, deficient, and abundant, respectively. For 
example, 6, 28, 996, and 8128 are perfect, 4 is deficient, and 12 is abundant. 
For «-perfect numbers « is necessarily rational. 

Since o(1) = 1<2 and o(p)=p+1< 2p, the prime numbers 
P(p > 1) are deficient; and since 1 is the sole 1-perfect number, only the 
case x > | is of interest. It is easy to show that every multiple of a k- 
abundant number is x-abundant. 


Theorem 5 (Euclid-Euler): Jfnis even and perfect, thenn = 2°(2°+1 — 1), 
p > 0, and 2°+1 — 1 = p is a prime number (Euler), and conversely every 
such number is an even perfect number. 


Proof: Since n is even, we may set n = 2°u (with u odd). Then if 7 is 
perfect, 


a(n) = o(2°u) = o(2°) o(u) = (2°! — 1) o(u) = 2n = 2°+1y, 
so that 2°+! | o(u), and thus o(u) = 2°+1A, A > 1. Consequently, 
2°tly = (2°+1 — 1) o(u) = (20H? — 1) 2°41, ie, uw = (2°+1— 1))d, 


and thus, since p > 0, we have o(u) > (2+! — 1)A + A = 2°+1A = o(u), 
so that the equality sign must hold, and u has exactly two factors; there- 
fore u is prime and A = 1; i.e., wu = 2°+! — 1. The converse follows at 
once from 


a(n) = o(2°p) = o(2°) o(p) = (2°42 — 1)(p+1) (since p > 0) 
Sp Zeb 5 2 DP ep = On, 


Numbers of the form 2” —1 are called Mersenne numbers. Since 
2% — | = (22) — 1 = (22 — 1)(1 + 24+ ++ + 24D), we have the 
following theorem: 


Theorem 6: A number 2” — 1 can be a Mersenne prime only if p is prime. 
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Thus the search for even perfect numbers is reduced to the search for 
Mersenne primes. It is still an open question whether odd perfect numbers 
exist and also whether there are infinitely many Mersenne primes. 

Still less is known about amicable numbers,'® i.e., pairs a,b with 
o(a) = o(6) = a+ 5, or in other words Digig.aca @ = 0, Lajaco d = a. 
Example: 220, 284. 

From the factorization 


a" ++ be = (a + b)(a¥-} — q-2h -L ee ab'-2 + be-1) 
for odd u we have at once the following theorem: 


Theorem 7: The number 2” + 1 can be prime only ifn = 2” (v > 0). 


Numbers of the form 22” +- 1 are called Fermat numbers or Gauss numbers 
or, if prime, Fermat (or Gauss) primes. For v = 0, 1, 2, 3, 4 we have the 
primes 3, 5, 17, 257, 65537, but no further primes of this form are known. 
At least y = 5 is not a prime, in view of the fact that 641 | 22° + 1 (Euler). 
In any case, (2? + 1,2%*°+ 1) = 1 forv~u. 


As a supplement to the last paragraph in §2.11, let us mention the following 
theorem of Gauss: a regular p-polygon, where p is a prime number, is constructible 
with ruler and compass if and only if p is a Gauss prime; and all constructible 
n-gons are obtained by replacing the q; in (11) with Gauss primes. 


3. Continued Fractions 


3.1. By the Euclidean algorithm §2 (7) the fraction a/b admits the 
representation: 


a _ nh eas I 
(15) en ee T 
ry fry 
1 
= 4+ 1 
ue 
q2 Qst+. 
1 
+ i 
Peers 
ss Qn+1 


16 For a detailed account of amicable numbers, see A. Wulf: Die befreundeten Zahlen 
nebst einem Ausblick auf die vollkommenen und aliquoten Zahlen. Géttingen, 1950, 
hectographed. 
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In general, a fraction of the form 


16 ite se a EG Ole a 
se aa ag oT TR T Tb 


ag 
b, + oa ay 


is called a continued fraction, the a; are the partial numerators, the 5; the 
partial denominators, and 6, is the first term. If all a; = 1, all 5; are 
rational integers, and b; >0 for i >0, then (16) is called a regular 
continued fraction. Such a fraction can obviously be normalized so as to 
make the last partial denominator greater than unity. In the present 
section we shall always assume that this has been done. The notation 


1 


Ca aaa 


= [bo ’ b, g anes b,] 


a 

* Ba 
is in common use. For convenience in the statement of proofs, it is often 
desirable to arrange that the b, are real and positive and to include this 


property in the definition of the symbol. Then we can show at once that 
forn > | 


(7): tibet Shea 


[eee | = [bo» [bi » +» Onl, 


[Bo » Ba» sssr Ba = [B04 ons Bana + rl (recursion formula). 
From the fact that 
[by + BL, By evry B,} = BEF [by By 5 ons By] 


we see that without loss of generality we can confine our attention to 
continued fractions for which b, > 0, as will be assumed throughout the 
present section. The reduced fraction 


(18) o a a ee ee ee 


is called the vth convergent, A, is the vth partial numerator, and B, is the 
vth partial denominator. Obviously Ay = 59, By = |. For convenience we 
also introduce 


(19) A_» = 0, A_, = i B_s = 1, B_, = 0. 
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Then we have the fundamental linear recursion formulas: 

(20) A, =5,4,,; +42, B,=b,B4,+B. (20) 
and the identity 

(21) A,B,_, — A,B, = (—1) (v > —1). 


Proof: For v = 0 the formulas (20) follow from (19). If (20) holds for 
v — 1 (v > 1), then by the second equation (17), again assuming that the 
b; are real and positive, 
A (b,1 + =) A,» a Ais 1 
fe fie busta t BY 
"(ba +=) Bat Bes 


— b(b,_1A,-2 ot A,_3) as A,_¢ —_ b,A,_4 ae A,_» 
b,(b,_1B,_» ae B,_3) ca B —2 b,B,_y =e B,_» ‘ 


where the last equation follows from the induction hypothesis. In order 
to show that for integral 5, the last fraction is already reduced, we consider 
the A, , B, as defined recursively by (20) and (19). Again the last equation 
holds, and (18) follows from it by induction; so it remains to prove (21). 
But for vy = —1, (21) follows from (19), and by complete induction 


A,B,_, = A,_,B, = (b,A,-1 sas A,_2) B -1 A,-4 (b,B —1 oF B,_2) 
= —(A,-1B,-2 — Ay-28,-1) = —(—1)? = (-1)",7 


which is (21), as desired. For integral b; we at once have (A,, B,) = I, 
so that the fractions (A,/B,) are reduced. 

As an estimate for the B,, it follows from B, >0, B, = b, > 1, 
B, = 6b,b, + 1 > 2, by complete induction for v > 3, that 


B, = 6,B,,+B,. >v—l+yv-—-2=v+(0—-3)2y, 
and thus 
(22) B,>v forall v>O. 


If we now define the nonterminating regular continued fraction 
[bp , b, , by, ...]as the sequence of partial quotients (A,/Bn) = [bo, 01, -.-, Onl, 
we have the following theorem. 


Convergence theorem: Every nonterminating regular continued fraction 
is convergent. 
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Proof: Since 


A,B,_4 in A,_,B, 


An _ 40 , o (4, _ Ant Ay s 
VST ate, a ar BB, 
ey olay a 
hore B,B,_1 Par At a =) 


<b +1+ )G(=%+1+%), 


v=1 


it follows that )}~, [(A,/B,) — (A,-1/B,_1)] is absolutely convergent, and 
thus also convergent. 

For the proof of the following theorem we need a slight generalization 
of the Euclidean algorithm for real numbers. For every real number p we 
let [p] denote the greatest integer g < p, g = [p] <p < [p] + 1. Sucha 
g always exists, since the field of real numbers has an Archimedean 
ordering (see IB1, §4.1), which means that for every real p > 0 there 
exists a natural number 7 such that - 1 > p. In the set of all such 7 there 
is a smallest, which we put equal to g + 1. Then obviously g = [p]. For 
a negative nonintegral p we have [p] = —({—p] + 1) with the desired 
properties; for an integer p it is clear that [po] = p. In the division with 
smallest positive remainder for the ring ©, say a=qb+r with 
p = a/b =q + r/b, we have q = [p], since 0 < r/b = 8 < 1, and 
therefore p = [p] +6, 0O<8<1. From the definition of {[p] it 
is clear that every real p admits such a formula, which is the desired 
generalization. 


Expansion theorem: Every real number can be expanded in exactly one 
way as a regular continued fraction. 


Proof: Set 5, = [p] and p= 6,+1/m; thus yn, > 1 if p is not 
already an integer. Furthermore, let yn) = p, 6, = [m1], 41 = 5, + 1/n2, 
so that 


p=b+ 7 = (bo, 1» mel: 


b sant 
aes 


proceeding in this way we obtain a sequence, terminating or nontermi- 
nating, of integers b, such that 


(23) : p= [Do ’ b a 9ee9 bya ’ Mv]. 
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From (20) and (21) for vy > 1 it follows in the nonterminating case 


NAL—~1 + A,_9 = A,_4 A,_2B,-1 = A,_,B,2 
7 By-1 + By» B,_y B,_1(y.B,-. + B,-2) 


l l & l 


= 5s < . 
B,3(9,By-1 aie B,_) B,_y Be yoil 


04) |p— St 


vl 


from which we see at once that lim,,.A4,,/B,. = p, so that 
p = [b), 5,,...]. With respect to the uniqueness, we note that for (23) 
we have in the terminating case 


Ay = AO Bigg ose Pals 0<v<cn, 


and in the nonterminating case yn, = [b, , b,.,,...], v > 0, so that y, > 1 
for all v > 1 (where in the terminating case we must take account of the 
normalization 6, > 1). Thus in y, = [6,, m1] = 6, + 1/ yas > 0) we 
necessarily have 6, = [n,], so that 5, is uniquely determined by y, ; in 
particular, b) is uniquely determined by the value y) = p. If we now 
assume the uniqueness of the numbers 5, , b, , ..., b,_, , then by (23) », is 
also uniquely determined and thus, as we have just seen, so is b, . 

Since by (15) rational numbers have a terminating continued fraction, 
the preceding theorem gives us this result: terminating regular continued 
fractions represent rational numbers, and nonterminating continued fractions 
represent irrational numbers. 

From (21) it follows from division by B,_,B for v > 1 that 


A, Ava _ (—1)"" 


B, By B,B,1 é 


so that the sequence of first differences of the sequence A,/B, is 
alternating; taking into account the inequalities 


>B,_1, for v >2 and v=0Q, 


B, = 5,B,, + By» >B,4, forall v 20, 


which show that 1/B,_,B, is strictly decreasing, we see from 


A 
3 = < [b),5,,..J =k 
0 
that 
Ay Ay Aon As Ay 
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The recursion formulas (20) provide us with a simple setup for practical 
calculation of the partial quotients (A,/B,) of [by , 5, , ...]: 


v }—2|—-1/ 0 | 1 | tee |yv—2|v—1 | v 
b, te | = bo bo b,-2 | bya | b, 
Ay a bo bybo +1 | Ay,-2 Aya | 4A, = by A,-1 + Avs 
B,| 1}; 0/1 | &1+0 | Ba | Ber, B= 6) Bat Bes 


The last two lines are calculated, independently of each other, by the same 
rule (see the last column). For example, if we wish to calculate 
mw F& 3.14159265358, the Euclidean algorithm gives 7 = [3, 7, 15, 1, 292, 
1, 1, 1, ...]; and the corresponding setup is: 


vy [-2/—-1) 0 | 1 | | 4 
b{[—-|— 3 ae is | i | 292 
A,| 0} 1 3 | 2 333 355 | 103993 
B,| 1,0 1 ae | 1062 |B 33102 
Error: «| —|—, +1,4+10-? |—10-* | 40,8 + 10-4 | —2,7-10-7 | +0,6- 10-8 


Here the well-known approximations 3, 44 ... stand one under the other. 
It is remarkable that the slight increase from A, to A; and B, to B, produces 
a considerable improvement in the accuracy, whereas the far greater 
increase involved in passing from v = 1 to vy = 2 and from v = 3 to 
v = 4 fails to produce any correspondingly great improvement in the 
approximation. This phenomenon finds its general explanation in the 
formula (24). With the increase of 7, and therefore of 6, , the approxima- 
tion A,_,/B,_, is improved. 


3.2. One of the essential properties of the above partial quotients of a 
number k rests on the fact that among all rational numbers these partial 
quotients best approximate the number k, in the sense of the following 
definition: p/q is called a best approximation to k if from 


Ig —#| <|2—«| and ea it follows that b > q. 


In other words, in order to make a better approximation than the above- 
defined best approximation, we must have recourse to larger denomina- 
tors.1?7 Now we can show that all the partial quotients A,/B, are best 
approximations to k. In order to find all the best approximations to k, we 


17 The problem of finding best approximations is of practical importance in the 
technology of power machinery, where the. gears should approximate a given trans- 
mission ratio as closely as possible, but with the smallest possible number of cogs. 
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must also consider the so-called subsidiary partial quotients (the 4A,/B, 
are then called the A,/B, principal partial quotients) 
Aj_2 + pA) 
N,.. =>; *" Si ug Dycges 
 Byg + pBya ‘ ie 
It is easy to show that the N,,, lie in the open interval [(A,_2/B,-2), (A,/B,)] 
and change monotonically for p = 1, 2, ...,b,;_1 (A > 1). Then those N,,, 
for which 
Aj-2 _ A)_ Aa-1 
|& — Bat] > 1k — Me >|k- a 
are certainly not best approximations, since the A,_,/B,_, are better 
approximations and have a smaller denominator. More precisely, it can 
be shown (see Perron [3]) that we obtain 


for p< 4b, no best approximations, 
for p> 4b, best approximations, 
and for p= %b, a best approximation if and only if 
[Dy , Dyas ++) By] > [Da Basa, «I: 


3.3. A continued fraction k = [b,, b,, ...] is said to be periodic if 
there exist two numbers n and p such that b,4.54. = On4, for all A > 0 
and all x = 0, 1, p — 1; in analogy with the notation for periodicdecimals 
we then write k = [by,..., Bn1. bn, «+> Ona pr]. If the choice of p and n is 
minimal, p is called the (primitive) period and [b,, b,,..., by_,] the 
preperiodic part; if this latter part is missing, we speak of a purely 
periodic (or simply periodic) continued fraction. The following theorem 
is due to Euler: 


If the continued fraction expansion of a number k is periodic, then k is the 
solution of a quadratic equation which is irreducible in P{x}® and has integral 
coefficients (in other words, k is a quadratic irrationality, i.e., an algebraic 
number of second degree). 


Proof: Every real quadratic irrationality has the form (a + Vb)/c 
(a, b,c integers, b >0, 5b not a perfect square), and conversely every 
number of this form is a quadratic irrationality. Thus it is sufficient to 
show that a purely periodic continued fraction 7, = [bn , ..-. On+p-1] is a 
quadratic irrationality; in other words, we assume n = 0. But then 


Np == (5, goes bey-1] = [by 989 by-1] = No = k 
— [b pe Tete t Ae-t = KA + Ap-2 
Pe eee wee NpB p-1 oh By» kKB,4 By-» 


18 P is the field of rational numbers. 
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so that k is the solution of the equation 
By_yX? + (By_2 — Ap-4) X — Apa = 0. 
Of great importance is the converse of this theorem, due to Lagrange 
(see Perron [2], [3]): 
Every quadratic irrationality has a periodic continued fraction expansion. 


3.4. A generalization of periodic partial fractions is due to Hurwitz. 
Let each of the / rows 


1 1 
BG 55 
2) Ale 
GP 5 oe 


(25) Pan ee ae a er 


be an arithmetic progression of arbitrary order. Then 


1 2 
DieeDi sp weyD eat, a’ ), a‘ earn aye 
2 1 
= [b, ren eee ar al, a, bay a‘), sas} 


is called a Hurwitz continued fraction. If all the sequences in (25) have the 
order zero, in other words, if they are sequences of constants, the result is 
obviously a periodic partial fraction. The following theorem is due to 
Hurwitz (see Perron [2], [3]): 

For numbers » and & related by 

7= oS , ad — be £0; a, b, c, d rational,!® 

if € is a Hurwitz partial fraction, so also is y and conversely; moreover, the 
number of arithmetic progressions of nth order for every n > 0 is the same 
in both cases.”® 

As an example of a Hurwitz continued fraction let us consider 


— ((2n al 1) b)na1 = [5, 3, 56, rea i 
here we have 


L = 
k = ————-; 
i 1 
e®> —e » 


1® By taking the fractions over a least common denominator, we may confine ourselves 
to rational integers a, b, c, d. 

© Here it may be necessary to break up a sequence a, , dg, ..., SAY iNtO a, , dy, as, .. 
and a, ,@4,@,,... and so forth. Constant sequences may be inserted or removed. 
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taking b = 2, we see from the preceding theorem that e also has an 
expansion in Hurwitz continued fractions; we have 


e — [2, IL 2n, Vnmt = [2, l, 2; l, l, 4, l, l, 6, l, els 


The regular continued fraction expansion for 7 is unknown. On the other 
hand, 


or 


are expansions in the form (16) (see Perron [3]). 


4. Congruences 


4.1. Let G be an Abelian group (see IB2, §1.1), and let U be a subgroup 
of &. As was shown in IB2, §3.5, every group & can be represented as the 
union of disjoint cosets (residue classes) gU. The property of two elements 
£1 , &_ of belonging to the same coset is obviously an equivalence relation.” 
This relation is called a congruence and is written 


(26) & = g2(mod UW) or g, = g2(U); 


in words: g, is congruent to g. modulo U. From g,é€g,U follows the 
existence of a ue U such that g, = gu, and therefore g,'g, = u, or in 
other words gz*g, € UW; and conversely, g,'g, € U implies that g, = g,(W) 
or, in other words, if e is the unity of G, we have gz’, = e(U). Since G 
is Abelian, it follows from gz'g,¢U and gz!g,¢ U that gz19, -g;'¢, = 
(£084) 1 £183 E U, ie, £183 = 2o2,4(); in other words: congruences 
£1 = go (U) and g, = g,(U) may be multiplied, and naturally also divided. 
If the operation of the group is written as addition, i.e., if G is a module, 
then (26) is obviously equivalent to g, — g,E UW or g, — g, = 0(U), and 
then the congruences can be added and subtracted. 

If G is the additive group of a ring R, G = Rt, and if the submodule 
U is an ideal a in R+, we have the following theorem: 


Theorem 1: Congruences mod a may be added, subtracted, and 
multiplied. 


21 That is, it is reflexive, symmetric, and transitive (cf. IA, §8.3 and §8.5). 
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Proof: Let a, = a,(a) and a; = a,(a); froma, — a,Eaand ae R it 
follows [see §2.5. (ID] that (a, — a)a,;Ea, so that aa, = a,a;(a); 
similarly, a,a3 == a,a,(a) and therefore, by the transitivity, aja; = @_a,(a). 

If a = (m) is a principal ideal, we also write a = b (mod m), or a = 6 (m). 
In this case, assuming again that ® has a unity element, the congruence 
is also equivalent to m | (a — 5), or in other words to the existence of a 
XE R with a = b + Am. In particular, if R is Euclidean (see §2.10), the 
division formula a = gm + r shows that a = r(m), so that numbers are 
congruent if and only if they leave the same remainder when divided by the 
modulus. From the equality of the ideals (m) = (em) for every unit « it 
follows that the congruences a = b(m,) and a = b(m,) are equivalent if 
m,~m,. Finally, a = 6 (mod 1) is trivially true for arbitrary a, b. If 
a = (0) is the zero ideal, then a = 5(0) is equivalent to a = b, as can be 
seen at once. 

Theorem | can be interpreted in another way. If we consider the residue 
classes mod a as elements of a new set, to be denoted by R/a (read R with 
respect to a), then R/a is a ring, for which we can define the sum of two 


oN “N 
residue classes d, 6 as G+ 6 = a+ b, andthe product by db = ab, with 
aed, beb. Then theorem | states that these definitions are independent 
of the choice of the elements a and b from 4 and 4. The axioms for a ring 
can be verified at once (cf. IBS, §3.6). In the terminology of group theory, 
the additive group R/at is precisely the factor group (or factor 
module) R+/a (see IB2;.§6.3). The ring /a is called the residue class ring 
modulo a. The mapping a — 4 is a ring homomorphism (see IBS, §3.6). 
If we choose one element from each residue class, the set of these represen- 
tatives is called a complete residue system mod a. For example, 
{0, 1, ..., m — 1} and {0, —1, —2, —3, ..., —(m — 1)} are complete residue 
systems mod m in the ring ©, the former being called the smallest positive 
residue system. In general, residue class rings have divisors of zero; for 
example, 2,3 are divisors of zero in €/(6), since 2-3 = 0(6), i.e., 
2-3 = 0. Below we shall also write a mod m instead of d. 


4.2. Let R be a principal ideal ring (see IB5, §3.4). Then: 
From a = b(m) follows (a, m) = (6, m). 


Proof: As was shown in 4.1, the congruence a = b(m) is equivalent to 
an equation of the form a= b+ Am. It follows that (a,m)|b and 
(a,m)|m, so that (a,m)|(6,m); similarly, (6,m){(a,m) and thus 
(a, m) = (6, m). 

The element (a, m) = d is called the greatest divisor of the residue class 
4 = amod m. If d = 1, the class a mod m is called a (relatively) prime 
residue class mod m. A system of representatives of the prime residue classes 
mod m is called a reduced residue system mod m. 
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If m is a prime element, the nonzero residue classes coincide with the 
prime residue classes. 


Theorem 2: Ina principal ideal ring § the prime residue classes mod m, 
with arbitrary m, form a group ©», the prime residue class group mod m. 


Proof: If amodm and bmodm are prime residue classes, then 
(a, m) = (6, m) = 1, so that (ab, m) = 1, i.e., ab mod m is also a prime 
residue class. Thus we need only show the existence of a residue class 
inverse to amod m. From (a, m) = | follows the existence of elements 
Xo, Yo in § such that ax, -+ my, = 1, and thus ax, = 1(m), so that 
X,) mod m is the desired inverse residue class. 

In © the group 6,, obviously has finitely many elements; the order 
| G» | of this group is denoted by y(m) and is called the Euler function. 

Theorem 2 states only that the congruence ax = b(m) is solvable for a 
and 6 prime to m. But if (a, m) = 1, the congruence ax = b(m) is uniquely 
solvable for arbitrary 5; for it is obvious that bx, mod m is a solution if 
X) mod m is inverse to amodm, and from ax, = ax, = b(m) follows 
a(x, — X_) = 0(m), so that m| (x, — x2), ie., x, modm = x,modm 
(see also §7.2). 

If the order | G,, | of the prime residue class group G,, is finite, the 
group-theoretic theorem that the order of a subgroup is a factor of the 
order of the group (see IB2, §3.5) provides us with the (generalized) lesser 
Fermat theorem: 

From (a, m) = 1 follows a'®n! = 1(m). 

Proof: The element a mod m = de G,, generates the cyclic subgroup 
(a) = {4, d®, ..., d*—1, d* = T}; for its order k we have k||G,, |, so that 
G'©n! — T or, written as a congruence, a'°n! = 1(m). 

The number k is also called the order of the element G, or of a mod m.”” 

We have just now, and also earlier, made use of the obvious but essential 
fact that congruences between numbers of the original ring § are equivalent 
to equations between residue classes. When we are passing from equations to 
congruences, we may choose arbitrary representatives of any given residue 
class, in view of the fact that by theorem | the sum and product of residue 
classes are independent of the choice of representatives; in other words: 
in a congruence any element of the ring may be replaced by any element 
congruent to it (when speaking of a power a” we must of course consider 
n not as an element of the ring but as an operator). 

In the ring © the Fermat theorem obviously has the form 


avim) = I(m), if (a,m) = 1. 


22 Instead of ‘“‘a has the order k mod m,” it was customary in the older literature on 
the ring € of integers to say ‘‘a belongs mod m to the exponent k.” 
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In particular, ifm = p ~ 0isa prime, we have a?-! = 1(p) for p¢ a; and 
thus for all a without restriction, 


a? = a(p), a an arbitrary integer, 


which for p ¢ a is equivalent to a?-! = 1(p), since in a congruence we may 
always cancel any number prime to the modulus (i.e., the inverse residue 
class exists). 

If a mod m is a nonprime residue class, so that (a, m) = d 1, and if 
we set a = a,d,m = m,d, we have 


am, = adm, = am = 0(m), m, = 0(m), 


i.e., every nonprime residue class is a divisor of zero in §/(m) and thus has 
no inverse. Since the prime residue classes coincide with the elements of 
§/(m) that are not divisors of zero, we have the result: 

In §/(m) an element has an inverse if and only if it is not a divisor of zero. 


4.3. Now let the module be a prime element p in §. Then the zero class 
is obviously the only divisor of zero in §/(p), so that §/(p) is an integral 
domain, and in fact a field, since all nonzero elements have inverses. 


Theorem 3: Jf p ~0 is a prime element in the principal ideal ring §, 
the residue class ring §/(p) is a field, the so-called residue class field 
modulo p. 


Furthermore (see §2.10) we have: 


Theorem 4: The polynomial ring §/(p){x], p ~ 0 and prime, is Euclidean. 


In the following discussion (leading up to the Wilson theorem) it must 
be remembered that a polynomial over a field cannot have a number of 
zeros greater than its degree, and that in the canonical factorization the 
factor (x — a«)™ necessarily appears if « is a zero of mth order (see IB4, 
§2.2). The fact, emphasized in §4.2, that congruences are interchangeable 
with equations in the residue class ring means for polynomials 
f(x) = Y, G:xt and g(x) = X82, 5;x* in $/(m)[x]* that the identity 
f(x) = g(x), i-e., the equations 4; = 5, (i = 0, 1, ..., n), for the coefficients 
with n = s, has exactly the same significance as f(x) = g(x)(m), which 
means in turn that n = s and a; = b,(m), i = 0, 1, ..., n° (comparison of 
coefficients mod m). 


3 If pis a unit, §/(p)(= §/(1)) consists of the zero class alone. Depending on whether 
we wish to consider the ring consisting of zero alone as a (trivial) field (the zero field) we 
will include or omit values for p that are units. The polynomial ring §/(1)[x] also consists 
solely of the zero element. 

4 This result also holds in general for R/a, where a is an arbitrary ideal in a ring R. 

25 See [B4, §2.1 for the difference in meaning between the statement f(x) = g(x)” 
expressing the fact that f(x) and g(x) are the same elements (polynomials) in a 
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Now let p >1 be a prime number. We consider the polynomial 
x?-1 — T © €/(p)[x]. By the Fermat theorem all the prime residue classes, 
which in the present case means all the nonzero residue classes, are zeros 
of this polynomial; if we ask how many of them there are, the answer is 
given by g(p) = p —1 = degree (x?-!— 1). Thus x?-!— 1 modp 
splits into linear factors: 


(27) wills Tl (x — v)\(mod p). 


yo] 


Comparison of coefficients mod p in the absolute term shows that 


p-1 p-1 
(28) —1 = [] (-) =(-1?- TT» = (Xp — 2! (mod p). 
pon] yo] 
For p > 2, we have p — 1 = 0(2), so that (—1)?"! = +1; if p = 2, we 
still have (—1)?-! = +1(2), since +1 = —1(2); thus (28) gives us the 
following theorem: 


Wilson’s theorem: Jf p is a prime, then (p — 1)! = —1(p), and 
conversely. 

For if n is reducible, then n|(” — 1)! +1 is obviously impossible. 
But if in (27) we compare the other coefficients (the coefficients on the 
right are the elementary symmetric polynomials of the zeros 1, 2, ..., p — 1) 
(cf. IB4, §2.4), we see that they are all =0(p), which gives the desired result. 
It is easy to generalize the Wilson theorem to principal ideal rings and 
groups ©, of finite order (p # 0, “1 and prime). 


4.4, Prime Fields, the Characteristics of a Field 
Now let us suppose that there exists a nontrivial subfield R of €/(p) 
(p > 1a prime); then certainly {0, 1} C 8, and therefore 


p-1 


x ay x me me oN & 
2=14+7,3=f1+i1+f,.,p—1=y1 


are also in R, so that R = C/(p), in contradiction to the assumption. 
Thus ©/(p) has no nontrivial subfield. 

A field % is called a prime field if it has no nontrivial subfield. If for a 
given field 8 there exists a positive integer n such that 


Yl=n-1=0 (LER, ne), 


polynomial ring and the statement that f(x) = g(x) for all x in a given set. For 
example, x? + 1 and x + 1, regarded as polynomials in €/(2) [x], are distinct, although 
x* + 1 and x + I have the same values for all x € €/(2). The identity theorem which 
is valid in the real (or in the rational or complex) field is not valid in general. 
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then the smallest such number n = y(S) is called the characteristic of 8. 

If there is no such positive n, then the characteristic y(K) = 0. The character- 

istic of an integral domain with unity element is defined in the same way. 
The following theorem is now obvious: 


Theorem 4: €/(p) is a prime field of characteristic p. The field P of 
rational numbers is a prime field and y(P) = 0. 


As was pointed out in IBS, §1.11, the fields in theorem 4 are, up to 
isomorphism (cf. IB5, §1.13), the only possible prime fields, since a 
composite characteristic would imply the existence of divisors of zero. If 
x(K) =p 0, then for arbitrary ae€R we _ obviously have 
p: «o=>?., «) = 0. It is also immediately obvious that the number of 


elements of a prime field % is equal to y(¥), if x(B) ~ 9. 
In an arbitrary field of prime characteristic p it is clear, since 


Pp (?), v = 1, 2,...,p — 1, that the binomial theorem takes the following 


simple form 


(29) (a + b)” = a? + b?, 


4.5. Tests for Divisibility 
As can be shown by the division algorithm (§2.10), for every integer 
g > 1a natural n has the uniquely determined digital representation 


k 
; log n : 
n= ) ajygt = aydy_ *°* GQ, c= , a, integral, 
p> 1 i log g 


O<a,<g—1 (=0,1,..,4). 


Here q = q(n) = Si.) 4; is called the digital sum of n, and a= a(n) = 
a — a, t+ ag — ++: + (—1)* a, is called the alternating digital sum. If 
we choose the decimal representation, i.e., g = 10, we have the following 
criteria for divisibility: 

1) 3|n implies 3)\|q and conversely, 

2) 9 | n 99 9 | q 99 99 

3) ll]n ” Illa ” ad 

4) in ” = 2A | ay_yay_-2 *** Ag and conversely. 

Proof: Since 10 = 1(3), we have 10” = 1(3) for all integers v > 0. 

Thus “_, a;10* = Y*_, a;(3), which implies 1). In the same way 10” = 1(9) 
for all integers v > 0. As for 3), we have 10 = —1(11), 10? = +1(11) 


so that 10”+? = —1(11), 10” = +1(11) for v > 0, from which it follows 
at once that a(n) = n(11). Finally, 2 | 10, i.e., 10 = 0(2), and since a con- 
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gruence remains correct if both sides and the modulus are multiplied by the 
same number, we obtain 5 - 2* = 0(2’), and thus also 5’ - 2* = 10° = 0(2*) 
for A > 0, so that a,a,_, °°: a: 10’ = 0(2’), which implies the criterion 4. 

Supplementary remarks: By combining one of the first three with the 
fourth of the criteria just given, we can obtain other tests, e.g.: from 6 | 7 
follow 3 | n and 2 | n, and conversely, etc. 

In general, the first three criteria are based on the following lemma, 
which can easily be proved. 


Criterion: Let (d, 10) = 1, and let h be the order of 10 mod d. Also let 
10° = g,(@) for i = 0, 1, 2, ...,4 — 1. If 


Gn(n) = Qo8o + A181 + °° + An-1Bn—-a + GnBo + Gnyi81 + °° 


is the generalized digital sum, then d|n implies d| q,(n) and conversely. 
For d = 11 we have g, = 1, g, = —1, where the g; is chosen to have the 
least possible absolute value. 


4.6. Periodic Decimal Fractions 


The following remarks for the base g = 10 are equally valid for an 
arbitrary integer g > 1. By IBI §4.1, every real number can be expanded 
in a unique way as a decimal fraction 


Q_jQ_141 *** A_1Qq , Ay *** 
(=a_, : 10’ oa aA_t41° 10/-1 + vas + Ao + Qa, ° 10-2 + as 
0<a,<9 forall v>-—-J/, 


if we agree on the normalization that a = 9 for all large v is not permitted 
(i.e., there does not exist an N such that a, = 9 for allv > MN). If we start 
from a rational number r = a/b > 0 (a > 0,6 >0, {a,b} C@) and 
employ the division algorithm to obtain the successive digits a, , the fact 
that only the numbers 0, 1, ..., 8 — 1 can occur as nonnegative remainders 
means that after at most 6 + 1 divisions two or more of the remainders 
must be equal to each other;? thus the sequence of digits must be repeated 
from a certain position on, so that we have a periodic decimal fraction: 
P= G_1°°' Ay, Gye °** AsQ541 °° As4p, Where as usual the periodic 
part is denoted by overlining. The number P is called a period, and every 
multiple of P is also a period. The smallest P > 1 is called the primitive 
period, and in the present section the word “period” always means the 
primitive period. If we choose s > 0 as small as possible, then 0. a, -** a, 
is called the nonperiodic fractional part and s is its length. The digits 


26 The division 5 : 2 shows that the number of steps b + 1 (=3) can actually assume 
its maximum value. In the present discussion we regard terminating decimals as having 
the periodic part “0.” 
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a_1°"' Ay. @, °°: a, arecalled the preperiod, and s+ /-+ 1 is its length. 
Also, 0. Pe *** AgAgi1 °° As,p iS called the fractional part, which is said 
to be pure opiibdie if s = 0, but otherwise mixed periodic.2? In view 
of the normalization, the fractional part is smaller than 1, so that 
[r] = a_, ‘++ a); here [r} is called the integral part, k = 1+ 1 is the 
number of digits before the decimal point, and r is an (/ + 1)-place 
number. 

From 10! <r < 10+! it follows that 1 < log r/log 10 </+ 1, 
so that kK = [dog r)/dog 10)} + 1. Our main eal is now as follows: 

A necessary and sufficient condition for the pure periodicity of (a/b) is 
(b, 10) = 1. The period P is then equal to the order of 10 mod b. 


Proof: If a/b = 0. aya, ‘*- a,, we obtain 


= a a 
=} (ar TIVES + pn a ea agar) 


A=0 


_ foe) 1 a 10?-1 + Qe: 107-2 + wee + ap 
= 2 cea | anon 


i eee ce a 
= or & for = Tor Torsa ~ rsy (4A 10 + + a). 


Since a/b was assumed to be in its lowest terms, we have b| 10? — 1, 
so that (6, 10) = 1, and from 10” = 1(6) it follows that P is a multiple 
of the order P’ of 10modb. But if P’ <P, then 10?’ = 1(5), ie., 
b| 10?’ — 1, would imply that a/b can be brought into the form 
a/b = A’/(10?’ — 1), and, since A’ < 10?’ — 1, the numerator A’ would 
then have the form A’ = a,l0?’-1 + --- + ap, so that we would have 
a/b = 0, aa, +: ap’, in contradiction to the assumption that P is a 
primitive period. Conversely, if we assume (5, 10) = 1, then the order of 
10 mod 8, call it P, is such that 5 | 10? — 1. But then, as has just been 
shown, a/b is periodic with period P. For an arbitrary fraction a/b 
(a > 0; 5 > 0, aand b integers) the length s of the pure periodic fractional 
part can be determined at once from the above theorem: s(>0) is the 
smallest integer such that the denominator of the fraction a -108/b in its 
lowest terms is coprime to 10. For then the fractional part of a: 10*/b 
is pure periodic. 

The greatest possible value of P is b — 1, and this value is actually 
assumed, for example, for } = 0.142857 (P = 6), whereas for #3 = 0.09 
we have P = 2 < 10(=6 — 1). 


2” These terms are sometimes applied to r itself. 
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5. Some Number-Theoretic Functions; 
The Mébius Inversion Formula 


A function f(x) is called a number-theoretic function if it is defined on 
a subset of ©, e.g., for all natural numbers. The functions 7(n), a(n), o,(n), 
defined in §2.12, and also the Euler function »(m), are number-theoretic 
functions; the summatory function of f(n) is a number-theoretic function 
if f(n) is such a function. A frequently occurring number-theoretic function 
is the Mobius function y(n), defined by 


1 forn = 1, 
H(n) = § (—1)4, ifn = pip,‘ p, is square-free (p, , ..., P, primes), 
0 otherwise. 


In general, an integer g is said to be k-free (k > 2 integral) if p* + g for 
every prime number p > 1; for k = 2 the number g is also said to be 
square-free. Let us also mention the unity function e(n) defined by 


Ol for n=1 
ae 0 for n>1. 


For p(n) we have the following theorem. 


Theorem 5: The function y(n) is multiplicative (see §2.12), and for the 
summatory function of u(n) we have 


(31) 2 u(d) = €(n). 

Proof: If one of the numbers 7, , n, is not square-free, then neither is 
nyn,, So that p(nn.) = 0 = p(m) v(m). If both n,,n, are square-free 
and coprime, then the multiplicativity is evident. As for n = 1, the formula 
(33) is obvious; for n > 1 let n = pj «+: p%: be the canonical factorization; 
a number d|n then has the form d=p,::p,, 0<Kkj< a, 
(i = 1, 2,..., 5). Thus 


wd) = ppp ps) = FT wr, 


i=l 


and therefore 


Ea) = 3 Bt Tae = - >, ups) x pC pys) y ps) 


I y ww ps) = 1 (uC) + w(p,) + w(p2) + + wpe) 


i=1 «,=0 


= I (u(1) + w(p)) = Ha L420; 
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The M6bius function enables us to prove an important relation between 
a function and its summatory function. 


Theorem 6: Every function F(x) defined on the set 3 of all natural 
numbers n > 0 is the summatory function of a uniquely determined function 
J (x) defined on 3, if the values of the function are the elements of an Abelian 
group © (e.g., ©+ or PX). 

Proof: Let & be a module. For every g € © we then have p(n) g = 0, 
+g or —g, according as p(n) = 0, +1, or —1.%8 For all integers k > 0 
we now set 


(32) f= 5 (5) Fa) (= x ae F (5) 


thus f(k) is an element of G. For the summatory function of f(k) we have 


L/@M=TF)-= de » (5) F@’) = x, & (gar) F@) 


| Na ad ‘mn 


= 55 # dr) R= Baw 5 wd 


© Fa) « (Gr) = Fem. 


It remains to prove the uniqueness. Let g(x) be an arbitrary function such 
that San g(d) = F(x). It follows from (32) that 


fo) = ¥ w@F () = ’ p(d) x e@)= F w@al@) 


= ¥ e@’) 


ad’ |n 


= ¥ a@’)e (=) = 8. 


ad’ |n 


a 
ee 


Theorem 6, together with (32), is called the Mébius inversion formula: 


From F(n) = Danf(@) it follows that f(n) = Dan w(n/d) F(a) and 
conversely. 


28 For every module Q the ring © may be regarded as an operator domain; we then 
define 


ny =) y, if n>0O, ne, yeQ 
vel 


and (—n)y = —(ny)(n > 0), which is obviously in agreement with the above text. 
Compare also footnote 4, §2.5. 
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If the operation in © is written multiplicatively, the formulas are as 
follows: 


Fin) =T]F@), and fi = T Fay. 


din d|n 


Let us now use these formulas to prove the following theorem for the 
Euler y-function: 


Theorem 7: The function p(x) is multiplicative, and Yan p(d) = n; 
moreover, 


8 


gin) =n] (1 _ ;) = IT psp, — 1) (n = a pri canonical). 


pl|n i=1 i=1 


Proof: Let 3, be the set of all integers x with 1 < x <nand (x, n)=d. 
Then obviously 


(33) (J Ma = YU Ms = {1 2,...,} and My, AM,=0 for d,~d,, 
d=1 d|n 


and WM, consists of exactly those multiples Ad of d for which (A, n) = 1 
and 1 <A <n/d. The Mt, with d|n thus contains g(n/d) numbers. 
From (33) it follows at once that: 


n=) e(5) = b (0). 


d|n 


Thus from the Mobius inversion formula for n = ab, (a, b) = 1, we have 


p(ab) = p(n) = 2 pd)" = oe p(d,d.) at 
(dy,dg)=1 
=) » Hd) w(d,) ——- a 


d,|a d,|b 


= wa) > 7Z. 2 PCa 7, = P@) v0). 


d,|a 


In view of this multiplicativity, for n = []#_, p?7‘ (canonical) we have at 
once 


vn) = T] o(p%9. 
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The numbers that are not prime to p7 are 2p, , 3p;, ..., p?~*p;, so that 


PDE) = PE — DE = pi (1 - > 


6. The Chinese Remainder Theorem; 
Direct Decomposition of €/(m) 


We now seek to find all solutions x of the simultaneous system of 
congruences x =a,(m,) (i= 1,2,...,5), where the m, are pairwise 
coprime; in other words, every solution must satisfy s congruences 
simultaneously. 

The Chinese remainder theorem (fundamental theorem on simultaneous 
congruences) reads as follows: the set of solutions of the simultaneous 
system of congruences x = a,;(m,) (i = 1, 2,..., 5) with (m,,m,) = 1 for 
kX consists of all the numbers in a uniquely determined residue class 
mod mm, -*: m,. 


Proof: If x, is a fixed solution and x, an arbitrary solution, it follows 
that x) = x, = a,(m,) for i = 1, 2,..., 5, so that m, | (x, — x), and in 
view of the pairwise coprimality we also have m = []j_, m, | (x1 — Xo), 
i.€., X; = X9(m); conversely, since any congruence remains correct when 
the module is replaced by any of its factors, x’ = x,(m) implies x’ = x,(m,) 
for all i. Thus there can be at most one residue class mod m in which all the 
solutions are contained, and every number belonging to such a residue class 
is necessarily a solution. In order to prove the existence of solutions, we 
set M; = m/m, (i = 1,2,..., 5); then obviously (4,, M,,.... M,) = 1; 
thus there exist numbers y, , y., ..., vy, such that M,y, + +: + M,y, = 1. 
Let us suppose that also e; = M,y,; (i = 1,..., 5); then for all i we have 
the congruences: 


(34) a= 0(;), = 7) for J fF i, 

and thus 

(35) tl=et+-:+t+e, =e,(m), i1e., e; = 1(m,) and 
» ae; = ae; = afm), 
t=1 


so that a,e, +--+ a,e, is a solution of the system of simultaneous 
congruences. 
The e; have the important property that 


(36) ee; = : 
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Proof: From (34) it follows that e,e; = 0(m,) fori 4 jand k = 1,..., s, 
so that we also have m| e,e; (i ~ j). But this is exactly the first relation 
(36), and thus from | = e, + ---+ e, it follows by multiplication with 
e, that 


& 
é; = » ee; = e;(m). 
j=l 


An element a of an arbitrary ring ® such that a? = a is called 
idempotent. Then (36) states that in ©/(m) the residue classes é,, ..., é, 
form a system of orthogonal idempotents (i.e., idempotents such that the 
product of any two of them is equal to zero). 

For an arbitrary commutative ring we define the concept of direct sum 
as follows. 

A ring ® is the direct sum of the ideals R, (in R) (i = 1, ..., 5), or in 
symbols: 


R= KR, ORO OR=LOR, 
i=l 


if it has the following property: 
Every re ® can be represented in exactly one way in the form 


(37) r=nAtrytestre,, r,ER, GC = 1,..., 5). 


For an reR,NR, GA J) it follows, since r=r+0=04r 
cannot be two different representations, that r = 0, i.e., R,; VR, = {0} 
(i ~ Jj). For arbitrary r;e R;,7r;¢ R; (i ~ J) the fact that all the R, are 
ideals implies at once that r,r; eR; ON KR; , so that r;r, = 0. If for {a,b} CR 
we have the representations a = rj; + +: +r,;,b=rjy+-+-+r{ in the 
form (37), it follows that 


& 
ab= rir’, 


t=] 


so that the two elements may be multiplied componentwise. Since r,r; = 0 
for r;E R;,, r;€R;, i Jj, the binomial theorem is valid in the simple 
form: 


(38) (r,+r,)* = rF + 7r* (kK > 0 integral). 
This decomposition of a ring is analogous to the direct product decomposition 


of a group or the direct sum decomposition of a module (see IB2, §8). If we 
do not insist on uniqueness in (37), we have simply the sum: 


R=R+-+,. 
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If the ring R has a unity element e and if e = e,+ --- +e, is the 
representation (37), then by squaring we obtain 


e=-@ref+--+eF—e + +e,, 


and thus, in view of e,? « R,; and the uniqueness of (37), we have the analog 
of (36) 

bidet 0, ix J, 
ei» i= J . 
Conversely, if we begin with a decomposition of the unity element 
e = >, e; into a sum of orthogonal idempotents, then for an arbitrary 
re §, if we denote the principal ideals (e;) by R, , we have 


8 8 
r=re=Yre=Yrn (=reeR, i=1,2,..,5) 
t=] t=] 


and for an arbitrary representation r= )\_4r;,r; =rje,ER,, 
8 
ade saa! ny wo) " ow U 
r= fee ae =rje,=Tr,, 


so that r; =r; for all i; that is, the representation is unique, so that 
R = Yi_, OR,. Thus (36) provides a direct decomposition of €/(m). 
Furthermore, we have the following theorem. 


Theorem 8: The residue class ring©/(m)admits the direct decomposition 
C/(m) = 4) O4) O- OE) (m= mm,-m,), (m,,m;) = 1 for 
i = j), and for the principal ideals (e;) we have the ring isomorphism 
(é,) = €/(m,). 

Proof: It remains only to prove the isomorphism. For an ?, € (é;) 
the congruence r, = a,e; = b,e,(m) obviously implies mod m_ that 
r, = ae; = a; = be, = b,(m,; thus the correspondence r; mod m— 
a; mod m, is a one-valued mapping. In this correspondence every 
G,€€/(m,) occurs as an image, since @,e;¢ (e; mod (m)) (= (é,) has 
precisely this image. Thus the number of images is equal to m;. We now 
show that the principal ideal (e; mod m) contains at most m,; residue 
classes, and thus, since the number of images is precisely equal to m, , this 
principal ideal contains exactly m, residue classes, so that the mapping is 
one-to-one. From x = z(m,) it follows by the definition of e; that 
xe, = xM.y, = ze; = zM,y,(m,), and in view of the fact that m/m, = M,, 
this congruence also holds mod m, so that the principal ideal (é;) consists 
of the residue classes (possibly not distinct, so far as we have shown up to 
now) 0, e; , 2e;, ..., (m; — 1) e; mod m, which already proves our assertion. 
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It is easy to show that this mapping preserves addition and multiplication, 
i.e., the images of the sum and the product of two residue classes are equal 
to the sum and the product, respectively, of the images. But these proper- 
ties of one-to-oneness and the preservation of addition and multiplication 
are exactly the properties of a ring isomorphism. 

By componentwise multiplication it follows from 


(39) r = r,(mod m,), (r; mod m) C (e; mod m), 


that the mapping (r mod m)—(r;mod m) is a (ring) endomorphism 
(i.e., a homomorphism of a ring into itself), and thus the mapping 
(r mod m) — (r; mod m,) (=a; mod m,) is a homomorphism, in view of 
the fact that (é,) < €/(m,). 

If for m we choose the canonical decomposition, so that the m; are now 
prime powers, then the structure of the residue class ring €/(m) is already 
known if we know the structure of €/(p*) (p a prime), as follows immedi- 
ately from the fact that not only addition (trivially), but also multiplication, 
as we have shown, can be carried out componentwise. A similar special 
case occurs, of course, for the direct product of a group (see IB2, §8), 
a fact which is of interest in the present context for the prime residue 
class group G,,. For we see, first of all, that if (r,m) = 1 and 
r=r,+r,°* +1r,(m) is a decomposition mod m in accordance with 
(37), then r = r,(m,) and (r, m;) = 1, so that we also have (r;, m;) = 1. 
Conversely, if (r;,m,;) = 1 for all i = 1,...,5, then it follows from 
r=r,(m,) that (r,m,) = 1 for all i; thus also (r, m) = 1. If we now 
denote by U,, the set of residue classes # € €/(m) for which 


r=ete +e. + Ae; + e4, + +e,(mod m), (A;,m) = 1, 


it follows at once by componentwise multiplication that UW) is a subgroup” 
in G,, . Thus we have the following theorem: 


Theorem 9: G,, is the direct product of U): 

(40) 6, =u x Ux x Uo = PL uo; 
t=1 

and we have the group isomorphism UY) — Gy, . 

Proof: Since 

(ey ov + Gea + Aes + Cir Fo He MOy Ho Hea HF oes 

+ Cp bot + es) = ey be He + Ames + ei to + es 
(mod m) = Dj pies = Vip; (mod m), 


2° In order to prove that a subset T of a finite group © is a subgroup it is sufficient to 
show that the product of every two elements in T belongs to T; and the G,, in the text 
is certainly finite. 
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the mapping 
(y bent hepa HAC + legs + et + ey) mod m — A, mod m, 


preserves addition and multiplication and is therefore a homomorphism 
of UM onto® G,,; thus Uf) has at least p(m,) (= | G,,, |) elements. The 
one-to-oneness follows from the above result that, as in the proof of 
theorem 8, (r, m) = 1 is equivalent to (r;, m;) = 1 for all j = 1, 2,..., 5 
(in the present caser; = 1 for j + i, r; = A,), and thus the homomorphism 
in question here is actually an isomorphism. By componentwise multipli- 
cation we further see that 


d Aes = [] Cr +o + era + Aes + Casa + 1° + @,)(mod m), 
t=1 t=1 

where the factors on the right-hand side belong successively to UW,..., 
Uso that 6, = WA - Ul --- Ul), Since p(m) = T] p(m,) = | Gy, | = 
TI, | U@ |, the representation of the elements of G,, as products of 
elements in UW, m4, ..., us) must be unique, so that the product de- 
composition of ©,, is direct. 

If we again choose for m the canonical decomposition, we see that the 
structure of the groups 6,, is known if we know the structure of the prime 
residue classes for moduli which are powers of primes. Regarding the 
structure of these groups (see, e.g., Hasse [3], [4], Scholz-Schéneberg [1]) 
we have the following theorem: 


Theorem 10: For all primes p > 2 and all X > 1 the groups G, are 
cyclic of order p(p*) = pp — 1), and the same statement is true (though 
trivial) for G, and G» . For A > 3 the group G,, is the direct product of two 
cyclic groups, one of which is always of order 2; e.g., G, = (—1) x @). 

It is also easy to show that every finite cyclic group can be represented 
as the direct product of cyclic groups of prime power order. This fact, taken 
together with theorem 10, shows that in theorem 9 we have a decomposi- 
tion of the group G,, in the sense of the fundamental theorem for finite 
Abelian groups (see IB2, §9.2). 


7. Diophantine Equations; Algebraic Congruences 


7.1. If y = f(x, xX2,....X,) is an arbitrary function which is defined 
at least for all integers x; , then f(x, , Xo, -.., Xn) = 0 is called a Diophantine 


*° A mapping “onto” means that every element of G,,, appears as an image; in the 
present case this property, together with the uniqueness of the correspondence, is 
proved in the same way as in theorem 8. 
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equation, provided we are seeking all integral solutions (x, ..., Xn). 
Closely related is the concept of an algebraic congruence, by which we 
mean the problem of determining all (integral) solutions of the congruences 
2(X , Xq, «+s Xn) = O(m), where g(x, , X2, «...Xn) Is a polynomial in the 
indeterminates x, , X2, ..., X, With integral coefficients. This congruence is 
obviously equivalent to the Diophantine equation g(x, , Xe, ...,Xn) + ym =0 
in the unknowns x,,X2,.-.,X%,, y. Since congruences can be added, 
subtracted, and multiplied, or in other words since €/(m) is a ring, it 
follows from x,;=y,(m) (G=1,..,”) that g(x, X%2,...,%,) = 
Z()1, «+» ¥n)\(m), so that every solution of an algebraic congruence (also 
called a root of the congruence) is an n-tuple of residue classes mod m. 
The definition of systems of Diophantine equations or algebraic congruences 
is similar. If 2(,,, .-., 4 3m) = /(m) is the number of solutions of the 
system of congruences /;(x,, ..., Xn) = 0(m) (i = 1,..., &), it is obvious, 
from the trivial estimate 0 < /(m) < m", that /(m) exists. Furthermore, 
for fixed f;(x,,..., Xn) (i = 1, ..., &) we have the following theorem: 


Theorem 11: /(m) is a multiplicative function. 


Proof: Let m= mm,, (m,,m,) = 1, and let C/(m) = RK, © KR, be 
the corresponding representation as a direct sum, so that R; ~ €/(m,) 
(i = 1, 2). If we decompose a solution Moses Xno(mod m) in the sense 
of (37): X;5 = Xp + Xjq(™), (x) ER, , (x),) € R,) @ = 1, ..., n), we obtain 


= , ” See = Fi(%4o ’ sted = O(mod m,), 
Fr | ae) Xno) = FilX10 i X10 ’ ) (mod m) — FAX ; saa) = O(mod mz). 
Conversely, if we start from solutions x1), ... of f; = O(m,) and x;,, ... of 
JF; = 0(m,), i = 1, 2, ..., m, we see that x,9@; + x2, ... is a solution of 
the system f; = 0(m), since it follows from (34) and (35) that 


Sd + Hela od = YAR) SG 1D oo 


so that from (m, , m,) = 1 we have 
mM | F(X, + Xie » +) 


Summing up, we see that the solutions mod m are in one-to-one corre- 
spondence with the pairs of solutions modm,, modm,, so that 
Im) = I(m,) I(m). 

Thus it is sufficient to determine /(p*) for primes p. For a polynomial 
F(x) in one indeterminate, it is well known, since €/(p) is a field, that 
1(p) < degree f(x) (see IB4, §1.2). For normalized polynomials 
f(x) = X75 ax", ie., with ag = 1, we have the general theorem: if 
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S(x) = x" + ax" + ++ + a, = 0(p) has no multiple zeros, then neither 
has f(x) = 0(p'),(A > 1), and I(p) = I(p’); for f(x) = 0(m) (m = [Ia pS 
canonical) we thus have I(m) = T]{., [(pi*) < n® (see Scholz-Schéneberg 
[1}). The equality is possible. 


7.2. The fact that the linear congruence ax = b(m) with (a, m) = | is 
uniquely solvable was already proved in §4.2. If (a2, m) = d, then d| d is 
obviously necessary for solvability and if we divide the result by d, 
producing the congruence a/d = b/d(m/d), we can easily show that the 
entire set of solutions consists of exactly d distinct residue classes. Thus it 
remains only to find a particular solution x, of ax = b(m) with (a, m) = 1. 
But x, = ba?’™—1 is a solution, by the Fermat theorem. Another method of 
finding such a solution is as follows. By the Euclidean algorithm we 
calculate x, , y, such that (a,m) = | = ax, + my,. Then it is obvious 
that x,b is also a solution. By §3.! the Euclidean algorithm is 
closely connected with the expansion into a continued fraction; let 
alm = [by, 61, ..-, bn] = A,/B, ; then certainly a= A,, m= B, in 
view of the fact that (a, m) = (A,, B,) = 1, and thus §3.2 (21) gives for 
v = nthe result that a = A,,m = B, 


AnBn-1 = B,An—1 — (—1)"" => aB,_, = MAn_1 ’ 


so that x, = (—1)""" B,_1, yy. = (— 1)” A,_1 is a solution of ax, + my, = 1. 

Since the congruence ax = b(m) is equivalent to the Diophantine 
equation ax + my = b, we see at once that for (a, b) = 1 the entire set of 
solutions can be obtained in the form x = x) + Am, y = yp — Aa, as soon 
as we have a particular solution xX), yy), and that x) can be determined in 
either of the two ways described above. 


7.3. A quadratic Diophantine equation that occurs in many mathe- 
matical contexts is the Pell equation x? — dy* =h (see Weber [1]), 
especially with h = 1 andh = 4, If d = a?®, where a is an integer, and we 
set ay = z, we arrive at the easily solved equation x? — z22=h = 
(x — z)(x + z). For if h = hh, is an arbitrary factorization of h, then 
from x —z=h,, x +z = h, we obtain at once all possible solutions; 
thus if h, , 4, are either both even or both odd we already have all! the 
integral solutions, and if 2|h but 4+, then the problem is insoluble. 

In general, it is easy to see that we may confine our attention to square- 
free d and coprime solutions x, y; if h is square-free, it is obvious that only 
coprime solutions exist. Then we can state an interesting connection with 
the theory of (regular) continued fractions (see Perron [2]): 

For every solution x >0, y>0, (x, y)=1 of x*-—dy?=h with 
O<|h|<|Vd\,d>1 square-free, the fraction x/y is a partial quotient 
of the (periodic) continued fraction expansion of Vd, and there exist infinitely 
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many solutions. If h = 1 and Vd = [by, b,, ..., b, = 2b), then from 
X/y = Aps-1/Bys-1 We Obtain the entire set of coprime solutions by letting p 
run through all the integers 0, 1, 2, ..., for even s and through all the even 
integers 0, 2,4, ... for odd s; moreover, the partial quotients satisfy the 
(Lagrange) relation 


A 5-1 + Bos-1 Vd = (A,4 + By-4 Vd), 


from which by expanding the right-hand side we obtain formulas for 
A,s-1 » Bys-; in terms of the smallest positive solutions A,_,, B,_, . 


7.4, The question of the solvability of the Diophantine equation 
x" + y” = z™, n >2 has remained unanswered up to the present day, 
although the famous Fermat conjecture states that it is unsolvable. Clearly, 
we may consider only (x, y, z) = | and prime exponents n = p > 2, and 
it has become customary to divide the problem into the two cases; first 
case: p+ xyz; second case: p| xyz. In the first case the unsolvability is 
known for all p < 253747889, and with the help of electronic computers 
the second case has been settled up to p < 4003 (for some further remarks 
see §8.3). 

For n = 2 it is obvious that 


(41) x = AQ? — vv), y = 2Aw, z= A(u® + v*); u,v,A_ integers, 


namely, the so-called Pythagorean triples, are solutions of x? + y? = z? 
and it can be shown in various ways that they are the only solutions, e.g., 
as follows. If we set € = x/z, 7 = y/z, we obtain the equation of the unit 
circle €? + 7? = 1, for which the parameter representation 


I — #? 
$= CoS p= Ta 


ee ae 
: 2r (t = 4) 
aes eee Se 


sets up a one-to-one correspondence, since ¢ = 7/1 + &, between rational 
t and the rational points of the circumference (i.e., points with rational 
coordinates). If we now introduce the homogeneous parameters: t = v/u 
(u, v integers), we have 


Mule. BS Pe aU 
(42) Gia ae 


31 It can be shown that for a square-free d the continued fraction expansion of Vd 
necessarily has the above form. 

32 This representation of the trigonometric functions as rational functions of the 
tangent of the half angle is useful in a well-known way for the integration of 
J (sin x, cos x, tg x, ctg x) with rational f (half angle method). 
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From now on we may confine our attention to coprime x, y, z. Then, in 
view of the fact that (2y)? = (2a + 1)? + (28 + 1)? = 2(4), exactly one 
of the two integers x, y must be even, and without loss of generality we 
may say that y is even; furthermore, we may assume that (u, v) = |, since 
otherwise we could cancel (u, v)?. For a common prime divisor p of 
u? — v? and u? + v? we have p|u? — v? + (vu? + v?), and thus p| 2u, 
p | 2v, so that from (u, v) = | it follows that p = 2;i.e., u, v are both odd, 
since otherwise u? + v? could not be even in view of the fact that (u, v) = 1. 
If we now cancel 2 from 2uv/(u* + v?), the denominator of the reduced 
fraction must be odd, whereas we must have y = 0(2). Thus a prime 
divisor p of this sort cannot exist, so that in (42) we must have 
x = 2 — v*, y = Qu, z = v2 + v*, as desired. 

Parameter prepresentations other than (41) are obtained by unimodular 
transformations of u, v, 


u= au! + Bo’ | a Bl _ ; ; 33 
p = yu’ + bo"? | 3 | = +]; a, B, y, 5 integral, 


and only from such transformations, since only then is it true that the 
u’, v’ can also be expressed as integral linear forms in the u, v. 


7.5. Of particular importance among the algebraic congruences are 
the pure congruences x" = a(m), since the solution of such a congruence 
is equivalent to the determination of the nth roots in C/(m). If 
(a, m) = d = d,"d, with n-free d, , it is easy to show by setting x = d,y 
that we may restrict our attention to the case (a, m) = 1. If x” = a(m), 
(a, m) = 1 is solvable, a is called an nth power residue, and otherwise an 
nth power nonresidue; in particular, for n = 2 we speak of quadratic 
residues, denoted by QR, and quadratic nonresidues, denoted by NR. 

For the pure congruences with (a, m) = 1 it is obvious that (x), m) = 1 
for every solution x, ; thus by the Fermat theorem n can always be reduced 
modulo the order m(m) of the prime residue class group G,, , so that we 
may restrict attention to 1 <n < g(m). Then x7™ — 1 = 0(m) has 
exactly all the g(m) prime residue classes as its solutions. If G,, is cyclic 
and g is a generating element (see §6, theorem 10), then every generating 
element of 6,, is a primitive root of the congruence. In g? with (p, p(m)) = 1 
we obtain, as is easily shown, all the primitive roots of the congruence 
and the number of them is obviously equal to g(p(m)). 


For cyclic G, it is obvious that the exponential congruence b? = a(m), 
(a, m) = (6, m) = 1 is always solvable and the solution is unique mod ¢(m). 
In analogy with logarithms, we write x mod(m) = inda (to be read: index ofa 


33 A matrix over an arbitrary ring R, 1 €R, is called unimodular if its determinant 
is a unit, so that in © the determinant must be equal to +1. 
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to the base b). It is easy to show that the rules for calculation are the same 
as for logarithms. 


For quadratic congruences x? = a(p), p > 2 a prime, (a, p) = 1, we 
define the Legendre symbol (a/p) (read: a by p) by setting 


=| 1, if aisaQR, 
pl = 6 (—-1,_~—s if’ aisan NR. 
If g is a primitive root of the congruence mod p, then | = g®-!(p), 
so that (g'®-)/2 — 1)(g(®-)/2 + 1) = 0(p), but g®-)/? = 1(p), so that 


gi?-D/2 1 | = 0(p). Since G, is cyclic (see §6, theorem 10), there exists 
a A such that g* = a(p). Thus we have 


bol ba 
a? =gri? =(—])Xp). 

For A= 2y it follows that a = g* = (g*)*(p), ie., (a@/p) = 1 and 
a'?-)/2 = 1(p). For odd A we have a®-)/2 = —1(p), and a = g*#+!(p) 
is an NR for the following reason. From x,? = g*#+1(p) it follows, if we 
set X) = g°(p), that g’ = g*#+1(p) so that 20 = 24+ 1 (mod g(p) = 
p — 1), and thus it would follow from 2(p — 1) that 2| (2 + 1), which is 
impossible. Summing up, we have the Euler criterion: 


a? = () (mod p). 


Since for even A we obtain the QR, and for odd A the NR, it follows that 
there are exactly as many QR as NR. 

For p >2 and q >2 prime, we have the Gauss law of quadratic 
reciprocity (see e.g., Hasse [3], [4], Scholz-Schdneberg [1]): 


a1, aad 
Ge eG) 


Since (a/p) is distributive (over G,,), we see from the Euler criterion that 

as soon as we know the values of the symbols (—1/p) and (2/p) we can 

use the reciprocity law to reduce either of the following two questions to 

the other: “which p are QR mod q ?” and “for which moduli q ispa QR?’. 
It is also convenient to introduce the Jacobi symbol 


(=) — I os \ m= I p*« canonical, (2,m) = 1; 


then we have the generalized law of reciprocity: 


(43) Qaeaye = 2). if (ab,2)=1, a>0, b>0. 
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The definition (43) enables us to calculate the Legendre symbols in a 
practical way, since it is easily seen that the numerators in the symbols 
may be altered at will modulo the denominators (for the further theory 
see, e.g., Hasse [3], [4]). 


8. Algebraic Numbers 


8.1. Let f(x) be an irreducible polynomial (see §2.6) in P[x] (where P 
is the field of rational numbers). Then in the field 8 of complex numbers, 
F(x) has exactly 1 zeros #, (i = 1,..., n), where n = degree f(x), provided 
*that-every zero is counted according to its multiplicity (= exponent of 
x — #, in the canonical factorization with respect to R[x]). Since the 
GCD(/(x), f’(x)) € P[x], it follows from the irreducibility of f that 
(f,f/') = 1, so that multiple zeros are impossible. If f(x) is reducible and 
F(4,) = 0 (9, € K), then 4, is also a zero of an irreducible factor of f(x). 
Thus it is sufficient to consider only the zeros of irreducible polynomials; 
they are called algebraic numbers; the nonalgebraic numbers in S are 
called transcendental. If for f(x) = Yi, a,xt we seth =n+ >", | a; |, 
then for a given A there exist only finitely many f(x) with integral a, 
(which is.obviously no restriction of generality) and thus only finitely 
may’ algebraic numbers. Thus for h = 1, 2,... we obtain all algebraic 
numbers in a countable sequence. Since S itself is not countable (see 
IBI, §4.8), it follows that there exist uncountably many transcendental 
numbers (for further details see III 13, §2). 

If # is a zero of the irreducible polynomial f(x) with degree f(x) = n, 
then f(x) is called the defining polynomial of 3, and # is an algebraic 
number of nth degree.*4 It is easy to see that the defining polynomial is 
uniquely determined up to associates. If f(x) is a normalized integral 
polynomial (i.e., all coefficients a, are rational integers with a, = 1), then 
the zeros are called algebraic integers, or simply integers. 

Let P(#), where # is algebraic, denote the intersection of all those 
extension fields of P in R that contain #.%5 Then P(%) is said to be an 
algebraic number field of degree n, if degree ® = n. For example, P(/2) 
consists of all numbers «+ 8/2, {a,B8}CP; and the numbers 
a+ bv/2, {a,b} C ©, are the algebraic integers in P(+/2). In §2.10 this 
set of integers was given as an example of a (Euclidean) ring. In 1B7, §2 
we will prove in general that every number in P(%) is algebraic of not 
more than the nth degree and that P(#) is identical with the totality of all 
numbers of the form >7y p;® (all p;eP). It is even true that this 


34 For the interesting continued fraction expansion of the algebraic numbers of 
second degree see §3.3. 
35 Thus P(#) is the smallest subfield of R that contains P and #. 


402 PART B_ ARITHMETIC AND ALGEBRA 


representation is unique, since from }?y p;0* = YP} pw, it would 
otherwise follow that 3 is a zero of )7y (p; — p,) x‘ and consequently of 
degree smaller than n. Thus in the sense of IB3, §1.3 the ring P(%) is an 
n-dimensional vector space with {1, 3, 3, ..., 9"-1} as basis, and we can 
pass to other bases by linear transformations with rational coefficients and 
nonzero determinant. If f(x) = Yj, a,x‘ = 0 is the defining equation 
of 4, then y = a, is an algebraic integer, since it is a zero of the normalized 
polynomial a?—*f(x) = y™ + anQn_, vy" + +++ (y = a,x), so that, besides 
the rational integers, P(#) even contains algebraic integers of nth degree, 
and it also follows from & = n/a, that every algebraic number can be 
represented as the quotient of two algebraic integers, where the denominator 
is an arbitrary rational integer. Then it can be shown (see Hasse [3], 
van der Waerden [3]), that the integers in P(9) form a ring. For the theory 
of numbers in algebraic number fields the following theorem is of funda- 
mental importance. In P(#) there exist bases which consist entirely of 
integers and have the property that every integer can be represented as a 
linear combination with rational integers as coefficients; and conversely, 
every such linear combination is an algebraic integer. Such a basis is called 
an integer basis. If w,,We,...,@, iS an integer basis and if 
W; Wy, Wi, ..., wf") (i = I, ..., n) are the systems of numbers®* conjugate 
to the w, , the determinant 


(n—1) 12 
goo 
’ -1 
sooy COED 


f (n-1) 
| WW) gv, wl 


is called the( field) discriminant of P(9). Since integer bases can be obtained 
from one another only by unimodular transformations, the value of d, 
as follows from the rule for multiplication of determinants, has the same 
value for all integer bases and is thus a field invariant: the value of J is a 
rational integer, as can be shown from the developments of 1B7, §7. 

The existence of an integer basis for the ring of integers in P(#) shows 
that we are dealing here with a vector space, in symbols C[w, , we, ..., Wnl- 
Since we have also defined multiplication for “vectors,” the P(%) and 
C[w, , We, ..., @,] are algebras,37 and P(#) is also a division algebra. 


36 By a system of conjugates we mean the zeros of the defining polynomial, if its degree 
is equal to the degree of the field. If « € P(#) is of smaller degree, say k, it can be proved 
(see IB7, §6) that k | m, and then the system of conjugates is obtained by writing each 
zero n/k times. 

37 By an algebra we mean a vector space for which there has also been defined an 
associative and distributive multiplication; thus an algebra is a ring. It is clear that the 
multiplication is fully determined if we can write the products of the basis vectors as 
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Concerning the zeros of a polynomial with algebraic numbers as 
coefficients, it can be shown (by the fundamental theorem on symmetric 
polynomials) that there exist polynomials with rational integers as 
coefficients which have at least the same zeros. Furthermore, the zeros of 
a normalized polynomial with (algebraic) integers for its coefficients are 
again integers. 

The product aa’ --- «"-) or, what is obviously the same, the absolute 
term multiplied by (—1)", of the normalized defining polynomial of «, 
is called the norm of « and is denoted by N(«). If « is an algebraic integer, 
then obviously N(«) is a rational integer (see also IB8, §1.2). 


8.2. As is shown by the example P(/ —5) in §2.10, the integers in 
P(\/—5)do not from a u.f. ring. It was one of the greatest advances in the 
theory of numbers that Kummer found a substitute for the u.f. theorem by 
introducing the so-called ideal numbers. As an equivalent concept, 
Dedekind introduced the ideals defined in IBS, §3.1 (see also §2.5). But 
before we can formulate the theorem which takes the place of the wf. 
theorem, we must make some preliminary remarks. 

By the product of the ideals a and b of a ring R we mean the ideal 
generated by all the products ab (a€ a, b € b) or, in other words, the set of 
all finite sums }°; a,b; (a; € a, b; € b). As in IBS, §3.6, an ideal p is called 
a prime ideal if R/p has no divisors of zero; if the prime ideal is a principal 
ideal, p = (p), then p is called a prime element. Prime elements are 
irreducible as is easily shown (cf. IBS, §2.3), but the converse is not 
necessarily true; for example, in the ring © of all even numbers, 30 is 
irreducible but not prime, since 6 - 10 = 0 (mod 30). On the other hand, 
in a u.f. ring the irreducible elements are also prime, which is in fact the 
fundamental lemma of the theory of divisibility (see §2.6). Thus our 
terminology is in agreement with that of §2.6 and IB5, §2.6. Furthermore, 
calling an ideal a maximal if no ideal b exists with a C b C ® (=(1)), we 
have the theorem that every maximal ideal is prime (see van der Waerden 
[2]); but then again the converse is not necessarily true. However, in the 
number rings C[w, , ...,@,] this converse does hold for prime ideals p that 
are distinct from the zero ideal and the unit ideal. 

If we now consider all numbers of the form a + bV —3, {a, bh} CG, ie., 
the ring €(V—3), we do not obtain all the integers in P(/—3); for 
example, the zero 4 -+ 4/—3 of x?-+x-+ 1 is an integer but is not 
contained in €[/—3]. A ring © of integers that is contained in P(:) is 
said to be integrally closed in its quotient field P(%) if all the zeros of a 
normalized polynomial (with coefficients in G) that are contained in P(3) 


linear combinations of these basis vectors: ww, = vr ComWe 5 that is, if we know the 


structure constants Ci3x (cf. IBS, §3.9). If an algebra is a field, it is called a division algebra 
(cf. [B8, §3.4). 
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are already contained in © itself.** Then by definition C[w, , we, ..., &y] 
is integrally closed in P(3). Finally, it is of especial importance that in 
C[w, , ..., W,] the maximal condition for ideals (see §2.9) is satisfied. 

In a ring ® we say that the theorem of unique factorization into prime 
ideals, the u.f.p.i. theorem, is valid if every ideal can be represented in a 
unique way (apart from the order of the factors) as the product of powers 
of prime ideals; an integral domain is called a u.f.p.i. ring if the u-f.p.i. 
theorem holds and if a C b (a, b ideals) implies the existence of an ideal 
c with a = be. We then have the following fundamental theorem (see 
van der Waerden [3]). 

An integral domain is a u.f.p.i. ring if and only if the following conditions 
are satisfied. 


I. ® is integrally closed in its quotient field.® 
II. In R the maximal condition (see §2.9) is satisfied. 
Ill. Every prime ideal p 4 (0), AR is maximal. 


Consequently, all rings C[w, , ..., w,], (where w,, ..., w, is an integral 
basis) are u.f.p.i. rings. It is not true that every u.f.p.i. ring is a u.f. ring, 
as was shown by the example of the ring €[/ — 5] at the beginning of §8.2; 
and conversely, not every u.f. ring is a u.f.p.i. ring, as is shown by the 
example of the polynomial ring R[x, y], where & is a field, since in this 
field the prime ideal (x) is properly contained in the prime ideal (x, y) and 
therefore cannot be maximal. 


8.3. If a is an ideal in @[w, , ..., w,] and « € a, then obviously 
{cw , XWg, -.., XW} Ca, and the aw, (i = I, ..., 2) are linearly dependent 
over P, and therefore also over ©, since the w; are linearly dependent over 
©. It can be shown (see van der Waerden [3]) that a is a vector space over 
©, which we have just seen to be n-dimensional. 

Two ideals a = @[a,,...,%,] and b = C[f,,...,8,] are equivalent, 
a ~ b, if there exists an algebraic p such that «, = pf; for all i = 1, ..., n. 
This relation is seen at once to be reflexive, symmetric, and transitive. Thus 
the set of all ideals falls into classes of equivalent ideals, the so-called 
ideal classes. It can be shown that the number h of ideal classes, the 
so-called class number, is finite (see Hasse [4], Landau [1], Vol. 3). 

If ¢ is a primitive nth root of unity, or in other words if ¢ is of the form 


88 If © has no unity element, then in place of ordinary polynomials we have expressions 
of the form a,x*-! + agx®-? + e+ agi  xF Byte l oe mix + me, a € G, 
Ny E G. 

3® The property of being “integrally closed”’ is defined for rings of integers in P(4) 
in the same way as before. In the text algebraic numbers and the number field P(#) were 
defined as certain subsets of the field of all complex numbers, but now they are to be 
defined as elements of certain algebraic extension fields in the sense of IB7, §1. 
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errikin with (k,n) = 1, then P(Q) is called the nth cyclotomic field. We 
shall examine the special case that n = p > 2 is a prime. 

A prime number p > 2 is said to be regular if p + h, where h is the class 
number of the pth cyclotomic field P(¢). Kummer was led to the study of 
these fields by his investigations into the Fermat conjecture; he proved that 
x® + y? = z® cannot be solved in integral quantities P(¢) if p is regular 
(see Landau [1], Vol. 3). 


Remark on §8. The theory of numbers in algebraic number fields can be 
developed in a way quite different from the ideal-theoretic discussion 
given here, namely, on the basis of the theory of valuations (see Hasse [3], 


[4]. 
9. Additive Number Theory 


In our discussion up to now the foreground has been occupied by 
questions of divisibility, like the u.f. theorem and the u.f.p.i. theorem, 
which are sometimes called questions of the multiplicative theory of 
numbers. By additive number theory we mean questions that can be 
reduced to the following fundamental problems. Let there be given x sets 
QW, , W,,..., W, whose elements are non-negative integers. By the sum 
C=%+4+%+:°:-4+ 4, = >1,U; we mean the set of numbers 
c= 4.14; (a; ¢ U,;) (Schnirelmann). If W, = W, = --- = UW, = A, we 
also write 7, WU, = nU. Let Wj) (G = 1,2,...,2) and C(x) denote, 
respectively, the number of positive numbers a; <x (a,¢ W;,) 
(or! <c¢ <x (ce )). The investigation of C(x) is our first fundamental 
problem; the second consists of deciding in how many ways a ce © can 
be represented in the above form. It is easy to see that the definition of a 
sum can be extended to the case n = oo; then if O¢ W; for every i, the 
sum >>, UW; is empty. 

Let PB = {0, 1, 3,5, ..., p, ...} be the set of all primes p > 0, p+ 2; 
then the (unsolved) Goldbach conjecture states that 38°) = 3, where 
3 = {0, 1, 2,...} is the set of all non-negative integers. This conjecture is 
obviously equivalent to the conjecture that every positive even number 
can be expressed as the sum of two primes. If we agree to say that two sets 
W and B are asymptotically equal, UW ~ %, if they coincide from some 
point on, ie., if for a sufficiently large N they are identical in the interval 
(N, 00), then it is known at the present time only that 4% ~ 3 
(Vinogradov). 

A set 8 is called a basis of kth order if kB = Z,(k — 1) B84 3,andB 
is called an asymptotic basis of kth order if kB ~ 3, (k —1I) BH 3. 
Thus $$) is an asymptotic basis of not more than fourth order and the 
Goldbach conjecture states that it is a basis of third order. 
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If we set 3 = {0, 1”, 2”, ..., x", ...}, the Waring problem (solved by 
Hilbert) states that 3 is a basis. A special case is the theorem of 
Lagrange: 3°) is a basis of fourth order. 

The number k(m; 2, , ..., 20,) of representations of m in the form 
m= Ait Agi, + + Ani, (Qui, € WM,, A= 1,2,...,”) is called the 
composition number of m, and every such representation is called a 
composition. \t is to be noted that under certain circumstances represen- 
tations with the same summands but in different orders are to be counted 
a corresponding number of times in k(m;,,...); for example, if 
VW, = {0, 3, 5}, W, = {3, 5}, 2, = {2}, we have 


10=34+54+2; 36%, 5e%, 22s, 
—=54342; 5e%, 36%, 26%, 


so that k(10; QI, , 2, , W,) == 2. If the order of the summands is explicitly 
disregarded, we use the term partition or partition function 
p(n; MU, ,..., 2). The difference is particularly noticeable in the case 
Ww= A= = A, = W. If n = cand W = B, then p(m; 3, 3, ...) = 
p(m) is simply called the number of partitions. The composition function 
k(m; 3, 3, ...) is in finite for all m and therefore meaningless; thus in 
general k(m; I, YU, ...) = k(m, WW) is the number of representations of 
m with positive summands from YI, account being taken of the order 
of the summands. 

It is only in the rarest cases that the partition function or composition 
function can be explicitly calculated. For the most part we simply try to 
discover the asymptotic behavior of these functions for m — oo. Similarly, 
the question of the structure of the set of numbers represented by a sum 
can be answered in general only to the extent of finding the asymptotic 
behavior of C(x) for x + 0, especially since for the most part we know 
nothing more about the summands than the asymptotic behavior of the 
functions Ql, (x) defined above. For example, the prime number theoremstates 
that [] (x) ~ x/log x, where [] (x) is the number of prime numbers < x. 

For the great majority of special problems it is necessary to employ the 
methods of analysis, in which case we speak of analytic number theory 
both for multiplicative and for additive problems. 

A generalization of the prime number theorem should be mentioned 
here. Let Bmx be the set of prime numbers p = k(m), (k,m) = 1; then 


I (x) ~ 


Saale (g(m) Euler function). 


It follows that every residue class mod m whose elements are coprime to m 
contains infinitely many prime numbers (Dirichlet). 


For a further discussion see also III13, §1. 
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The function A (x) defined as the number of numbers <x in aset U is 
often described by means of comparison functions (x), a practice which 
has led to the introduction of the following concepts: 


A(x) 


d(U, ¥(x)) = a Wx) (s(x)-density of U)*° 
8,(U, L(x) = aes ier (varied s(x)-density," if Oe), 


5*(A, Y(x)) = pees ne . (asymptotic s(x)-density), 


00 


d*(U, W(x) = lim a (upper asymptotic ys(x)-density). 


If 8* = 5*, we speak of the natural s(x) density 5,(U, ys(x)). Of particular 
importance is the case (x) = x, for which we simply write 5(21), 5,(2D), 
and so forth. Then we have the following theorem, which is relatively 
simple and easy to prove. 

Schnirelmann basis theorem: Jf 5(2) >0 and 0€ YU (or equivalently 
5(M) > 0 and 1 € AW), then A is a basis. 


Proof: From 6(%) >0 it follows that le, and so we set 

= {ay = 0, a, = 1, dg, ...},@,) <a, <5 if a,eU, then all a,+a, 
with a, <a,+ a, <4,,, — 1 belong to 2M; from a, < a,,, —a,—1 
it follows that the number of these a, + a, is equal to A(a,,, — a, — 1). 
If we define n by a, < xX <ay,,, So that A(x) =n, and let (2A)(x) 
denote the number of elements of 221 < x, we obviously have 


n-1 


(2A)(x) 2 A(x) + 2d A(Q,41 — a, — 1) + A(x — ay) 


n-1 


> A(x) + Y) SQW G41 — a, — 1) + dD — a,) 


v=0 


= A(x) — n5(Q) + 8) T Cdsar — a) + BCC — ay) 


v=Q 


= A(x) — A(x) 8(Q) + S(WA(Gn — ao) + S(Q)(X — an) 


* fin denotes the limes inferior. It is generally assumed that (x) > 0 for x > 0, 
limzs00 (x) = c© and Y(x) = O(x); the last condition is obviously a natural one, 
since every function A(x) must be such that A(x) < x, so that A(x) = O(x). 

“1 The function A(x) + 1 = A(—1, x) enumerates all the ae A withhO<ac< x, 
provided 0 € %&, whereas A(x) enumerates only the positive a < x. 
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= A(x) — 62D) + xd(Q) 
> xd(W (1 — d(Q)) + xd(W) 
= 26(A) x — SA) x, 


and thus 
d(2M) > 28(M) — 6A) = 1 — (1 — 8M). 
Applying this formula to 2% instead of 2&f, we obtain 
64M) > 1—(1 — sQMy? > 1— (1 — sys, 
and thus in general by iteration 
§(2*) > 1—( — 8(a))*. 


The case 8(2f) = 1 may be disregarded, since in this case & = 3; thus, 
since 0 < 5() < 1, there exists a k such that (1 — 8(21))* < 4, and 
therefore 5(2*2) > 4. From the following theorem we then have at once 
5(2*+19) = 1; that is, 2*+12{ = 3, as desired. 

From A(x) + B(x) > x for all x >0 (as follows, for example, from 
5(M) + 6(B) > 1) and 0 EWN B it follows that U + B= 3. 


Proof: Assuming that there exists an x such that x ¢ UW + 9, let us set 
B= {b, = 0,b,,5,,...} and determine n from b, <x < b,,,; such 
an n certainly exists since it follows from x ¢ UW + BandAvBSCA+B 
that x ¢ B. Since x ¢ W + B, none of the m numbers x — 5, (i = 1, 2, ...) 
can belong to YW, so that A(x—1)+n<x-—1, and thus, since 
n = B(x — 1) = B(x), we have 


x—1>A(x—1) + BX — 1) = A(X) + BO), 


in contradiction to the assumption. 

Now in order to prove the basis property for a set 2 with vanishing 
density, it is sufficient, in view of the basis theorem, to prove the existence 
of a number s such that 8(sQ) > 0. It was in this way that Schnirelmann 
succeeded in proving for the first time the basis property of the set of 
prime numbers $8, and in solving the Waring problem in a new way. By 
means of this method of Schnirelmann the additive theory of numbers 
has made great progress in recent times (see Ostmann [1]). 


CHAPTER / 


Algebraic Extensions of a Field 


Summary 


The original problem of algebra was to find the “solution” of an algebraic 
equation by means of “roots” or, in modern terms, to represent the zeros 
of a given polynomial by “rational expressions” in the zeros of (irre- 
ducible) binomials x" — a. After it had been recognized (Abel) that 
for the general polynomial (of higher than the fourth degree) such a 
representation is impossible, the problem took the form of representing 
the zeros of a given polynomial in such terms as are natural or unavoidable 
for the given case. For this new and more profound problem the Galois 
theory provides a completely satisfactory solution; it shows that the 
natural instruments for the representation of the zeros of a polynomial 
are determined by the structure, in terms of its subfields, of the smallest 
field (a splitting field) that contains all the coefficients and zeros of the 
polynomial. This structure (of the splitting field in terms of its subfields) 
is completely revealed by the structure of a finite group (the Galois group) 
uniquely determined by the polynomial; the Galois group consists of all 
those automorphisms of the splitting field (the isomorphisms of the 
field onto itself) that leave fixed all the elements of the coefficient field 
of the polynomial. From the properties of this group we can deduce 
the natural means of expression for the “solution” of the equation. 
For example, by examining the group alone we can decide whether a 
splitting field which belongs to this group can be constructed by means of 
“radicals” or, in other words, whether a polynomial that corresponds to 
such a splitting field is solvable ‘“‘by radicals” (binomials). Thus in the 
present chapter we first prove for every polynomial the existence of 
a splitting field and its uniqueness up to isomorphism. Then the properties 
of such splitting fields are described and, more generally, the properties 
of a finite extension K’ of an arbitrary field K, i.e. a field K’ which arises 
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from K by adjunction of (some or all) of the zeros of finitely many 
polynomials in K[x]. Every finite extension is seen to be a vector space 
of finite dimension over K. On the basis of these results we can construct 
the Galois group of a polynomial and set up the one-to-one correspondence 
of the intermediate fields (intermediate between K and a smallest splitting 
field) with the subgroups of the Galois group, and in particular the 
correspondence of the so-called conjugate subfields with the conjugate 
subgroups. By means of this correspondence we can then set up the 
above-mentioned group-theoretic criterion for “solvability by radicals” 
(cf. § 10.1). 

In classical algebra the coefficients are usually assumed to be complex 
numbers, a restriction which does not correspond to the algebraic nature 
of the problem; thus we shall! drop this restriction here and shall almost 
always consider polynomials with coefficients from an arbitrary field. 
Since we do not restrict ourselves to number fields it is clear that in the 
present chapter we are nowhere discussing the numerical “solution,” 
or the numerical calculation of the zeros; our discussion is purely algebraic. 


Exercises 


1. Determine the zeros of the polynomial x? — 5x + 3 and express them 
rationally in terms of a zero of the binomial x? — 13. 

. If the polynomial is x? — 7x — 3, what binomial can then be chosen? 

. Let the polynomial be x? + 0.8x + 5, and the binomial x? + 1. 

. Let the polynomial be x? + 2x + 8, and the binomial x? + 7. 

. Let the polynomial x? + x + 1, and then what binomial? 

. If cis a zero of the binomial x3 — z and B of x? + 3, show that the poly- 
nomial x* — 3x* + 3x + 17 has the zeros 2a + 1, 2« -(—4 + $8) + 1, 
2a *(—3 — 38) + 1. 

(Hint. If x is replaced by y + 1 in the given polynomial, a simpler 
polynomial is obtained.) 


Nn kk W WN 


7. Letting « and fB have the same meaning as in ex. 6, show that the 
polynomial x® + 6x + 2 has the zeros 
a — a8, —$(a— a8) + 9B (a + a8), —Ha — a%) — $B(a + 0°). 
8. Let 8 be a zero of x? + 3 and ya zero of x3 — 4 - (8 — 1). Show that 
y + 1/y is a zero of x8 — 3x + 1. 


1. The Splitting Field of a Polynomial 


1.1. Adjunction 


As before (cf. 1B5, § 1.9), we let J[x] denote the integral domain of the 
polynomials in the indeterminate x over the integral domain J, i.e. of 
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the polynomials in x with coefficients in J, where J may, in particular, 
be a field K; and by K(x) we denote the quotient field of K[x]. Also, 
in IB5, §3.10, we have proved the existence of (at least) one zero a of 
a polynomial g(x) € K[x] irreducible in K[x]; for we extended the field K 
to an extension field K(a) by the adjunction of a zero a of g(x). In general, 
we speak of the adjunction of a system S of (arbitrarily many new) 
elements a, b, c,... to a given field K if we have constructed an extension 
field of K which includes S, i.e. all the elements of S; and in fact we have 
in mind the “smallest” such extension field, and we denote it by 
K(S) = K(a, b, ¢, ...). AS a fundamental domain in which the adjunction 
is carried out we assume the existence of a (fundamental) field K* 
which is an extension field of K and contains S; thus the existence of 
such a K* must be guaranteed in one way or another. Then the 
“smallest” extension field of K that contains S is to be defined as 
the (set-theoretic) intersection of all extension fields of K in K* that 
contain S; since the intersection of arbitrarily many extension fields of 
K always exists and is again an extension field of K, the desired field K(S) 
must exist. An element we K* belongs to K(S) if and only if w can be 
represented as the value of a rational function (determined by w, though 
not uniquely) over K with arguments in S; for we see that every such 
element w belongs to every extension field K’ of K that contains S; and 
the totality of such w, since it forms an extension field of K that contains S$ 
and is therefore contained in every K’, must be the smallest field that 
contains both K and S. In particular: if the system S contains infinitely 
many elements, then for arbitrary z € K(S) there exists a finite subsystem S’ 
of S such that ze K(S’). Similarly, J[S] denotes the intersection of all 
integral domains (in K*) that contain J and S. 


Example. The field K’ = (i, V2, V3,...,./p,...) is the smallest 
extension field of the field 7 of rational numbers that contains i and the 
square root of every prime p. As a fundamental field K* here we may take 
the field of complex numbers. 


Exercises 


9. Let K = IT and let K* be either the field of all real numbers or the 
field of all real algebraic numbers. Show that the zeros of x? — 5x + 3 
lie in a field which may be denoted by K(/13). (Cf. ex. 1.) 


10. Let K = [J and let K* be either the field of all complex numbers or 
the field of all complex algebraic numbers. Show that the zeros of 
x* + 0.8x + 5 lie in K(i). (Cf. ex. 3.) 

11. Let K and K* be as in ex. 10. Show that the zeros of x2+ x+1 
lie in K(i - V3). (Cf. ex. 5.) 
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12. Let K and K* be as in the two preceding exercises. The binomial 
x3 — 2 has a real zero, denoted by ¥/2. Show that all its zeros lie in 
K(*/2, i - V3). Show that this field also contains all the zeros of 
x8 — 3x* + 3x — 17 (cf. ex. 6) and of x3 + 6x + 2 (cf. ex. 7). 


13. Let g(x) = x® — 3x + 1 and let K and k* be as in ex. 10. 
(a) Compute g(—2), g(0), g(1), g(2) and show that the polynomial g 
has three distinct zeros £, , & , &5. 
(b) Let B and y be as in ex. 8. Show that &, , &, &,arein Kt = K(f, y). 
(c) Set K = K(&,, &, &). Show that K = K(&,) = K(&,) = K(s) 
and that K is a proper subfield of K+. (Hint. K contains only real 
numbers). 


14. Let 1 be a positive integer with n > 1. Set K = IJ and let K* be the 
field of real algebraic numbers. Set p,(x) = x” — 2 and let the positive 
zero of p,(x) be denoted by &, (thus £, = 1/2). Let S be the set of 
all €, with n >2 and S the set of all £ with m > 10. Show that 
K(S) = k(S). 


1.2. Isomorphisms 


Two fields are said to be isomorphic to each other if they differ only 
in the notation (and meaning) for their elements and their “operations” 
(addition, multiplication); in other words, with a suitable change of the 
notation (and meaning) for the elements and the operations, a change 
which amounts to a one-to-one mapping (cf. IB1, §2.4 and IBS, §1.13), 
isomorphic fields may be identified. Thus “isomorphism” is an equivalence 
relation. From these remarks it is also clear what is meant by isomorphism 
of groups, integral domains, vector spaces, and so forth. The term 
isomorphism is synonymous with isomorphic mapping. 

An isomorphism of a field K onto itself is called an automorphism of K. 
If U is a subfield common to K’ and kK", and if f is an isomorphism of 
K' onto K” in which each element of U remains fixed (u = f(u) for 
every ue U), then f is called an isomorphism of K' onto K” over U or 
with respect to (or relative to) U. 


Example. In IT (i) (see the example in §1.1) let us map i onto —i and 
every re JI onto itself, so that r, + ir, is mapped onto r, — ir, ; then we 
obtain an automorphism of 7 (7) over 7, 


Exercises 


15. Let K = IT and let K* be the field of complex numbers, or else the 
field of all complex algebraic numbers. Denote the zeros of x? — 2 by 
£,, &, & and let K,= K(é,) for 1 <i<3, K= K(é,, &) = 
K(£,, &2, €3). Prove 
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a) there exists an isomorphism from kK, onto K, taking every rational 
number into itself 

b) there exists an automorphism of K taking £, into £, and every 
rational number into itself. 


16. Let €,, &, &; and K be as in ex. 13. Prove that there exists an auto- 
morphism of K taking &, into , and every rational number into itself. 


1.3. Irreducible Polynomials. Existence of a Zero 


Let K be a field and let g(x) = b,x" + +: + b,€ K[x], where K[x] 
is the integral domain of the polynomials in the indeterminate x with 
coefficients b, in K. The problem of finding the zeros of g(x) then becomes 
the following extension problem: it is required to adjoin to K (one or more) 
ZeTOS Q, , Ay, ... Of g(x); that is, to adjoin elements a; such that g(a,;) = 0. 
Essentially the same problem is solved if we construct an extension 
K’ of K such that g(x) has a linear factor (or several linear factors) in 
K’[x]. It turns out that such fields K’ can be constructed in a purely 
algebraic way, i.e., essentially by computation with polynomials over K. 
' With respect to adjunction of a single zero of g(x), where g(x) is irre- 
ducible in K[x], we have already (see IBS, §3.10) proved the following 
theorem. 


Theorem 1. Hypothesis. Let K be a field and let p(x) € K[x] be irreducible 
in K[x] and be of degree n > 2. 


Conclusions. (1) There exists (at least) one extension field L of K in which 
P(x) has (at least) one zero a. The extension field K(a) determined by the 
adjunction to K of a zero a of p(x) is uniquely determined up to isomorphisms 
over K and is isomorphic to the field K[x]/p(x) of the residue classes of 
K[x] with respect to p(x). 

(2) Moreover, K(a) = K[a], i.e., every b € K(a) is uniquely representable 
in the form b = cyt qa+ + + c,_1a""1, c,€ K. The field K(a) is a 
vector space over K of dimension n with the basis a’, a’, ..., a"—. 


Examples. Every rational-complex number g can be represented in exactly 
one way in the form g = cy + cai, where cy, c, are rational numbers. If 17 is 
the field of rational numbers, then every g€ JT'°(/2) can be represented in 
the form g = co + c1° V2, where cy, c, € 11 are uniquely determined by g. 


1.4. Arbitrary Polynomials. Existence of a Splitting Field 


By a splitting field in the wider sense (abbreviated: i.w.s.) of g(x) € K[x] 
we mean an extension Z of K such that in Z[x] the polynomial g(x) splits 
completely into linear factors. By a smallest splitting field Z’ of g(x) 
we then mean a splitting field i.w.s. such that none of its proper sub- 
fields are splitting fields iw.s. of g(x). In §1.3 we constructed, for a 
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polynomial p(x) irreducible in K[x], an extension field K’ = K(a) such 
that p(x) = (x — a) p,(x) with p,(x)e€ K'[x]. Now if g(x) e K[x] is an 
arbitrary polynomial, and thus perhaps reducible in K[x], any factor p(x) of 
g(x) that is irreducible in K[x] has a zero a such that g(x) = (x — a) g,(x) 
with g,(x)¢ K’[x], where K’ = K(a).1 Then g,(x) is of degree n — 1, 
if g(x) was of degree n >2. For n — 1 = | the field K’ is already a 
splitting field i.w.s. of g(x). For n — 1 > 2 we can apply the same proce- 
dure to K, = K’ and to g,(x) € K,[x] as was just now applied to K and 
g(x). After at most n — 1 repetitions of this procedure we obtain a 
splitting field i.w.s., call it Z, of g(x). But the intersection of all splitting 
fields i.w.s. of g(x) that are subfields of Z is a smallest splitting field 
of g(x), which gives us the following theorem. 


Theorem 2: Existence theorem. For arbitrary g(x) € K[x] with arbitrary 
K there exists at least one smallest splitting field Z. If g(x) can already be 
factored completely into linear factors in K[x], then K = Z. 


We must now ask whether the factorization of g(x) into linear factors 
is the ‘“‘same’’ for all smallest splitting fields of g(x); here the word 
“‘same’’ is meant in the sense that the number of different linear factors, 
i.e., the number of distinct zeros, is always the same, and the multiplicities 
are the same in every case. That this is so is the content of the uniqueness 
theorem proved below; the proof depends essentially on the following 
isomorphism theorem, which is also useful in other ways: 


Theorem 3: Isomorphism theorem. Hypotheses. (1) Let f be an 
isomorphic mapping of the field K' onto the field K", and let f be the (uniquely 
determined) extension* of f to an isomorphic mapping of K'[x'] onto K"[x"], 
where x" = f(x’). 


(2) Let p'{x’) € K'[x’] be irreducible in K'[x’], so that 
P°(x") = fp (X'N EK [X’] 
is irreducible in K"[x"]. 


Conclusion. If a’ and a" are zeros of p'(x') and p"(x"), respectively, 
then there exists a uniquely determined isomorphism f* of K'(a’) onto 
K" (a") which is an extension of f such that a" = f*(a’'). 


Remark. If g’(x') = ag + ax’ + ax’? + + + a,x'"e K’'[x’], then 
Fg’ (XN) = Fao) + Ala) x" + flag) x”? + > + f(G,) x" € K"[x"]. 


1 The case a € K is included. 
2A mapping / of 4 into (onto) B is called an extension of the mapping f of A into 
(onto) B, where 4 C A, BC B, if f(a) = f(a) for every aé A. 
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Proof. Vet p’(x’) be of degree n, so that p”(x") = f(p'(x’)) is also 
of degree n. Every element 5’ € K’(a’) can be represented in the form 
b’ = g'(a’), where the polynomial 


g(x) = ay + air’ to $a ee Kx] 


is in one-to-one correspondence with the 6’; in particular, b’ = a’ is 
equivalent to g(x’) = a’, and the same remarks hold for b” € K"(a’). 
Thus in order that f* be an isomorphism of K’(a’) onto K"(a") and also 
be an extension of f, or, in other words, in order that f*(b’) = f(b’) for 
every b’c K’ and f*(a’) = a", it is necessary that b” = f*(b’), where 
b' = g'(a’) and b” = g"(a") if and only if f(g’(x’)) = g”(x’). For we have 


b" = f*(b’) = f*¥ (ag + aa’ ++ + ay(a’y") 
= f(a) + f(a) a” + + + f(anay(a’"y"; 


and since g’(x”) is uniquely determined by b” = g’(a"), we get 
B"(x") = f(a) + fay) x" + + SG we’ = f(e'(x’)). But this 
necessary condition for an isomorphism f* is also sufficient; i.e., if in 
f(g’ (x’)) = g"(x’) we replace x’ and x” by a’ and a”, respectively, we obtain 
an isomorphism f, such that fo(¢'(a’)) = g”(a") and such that f, is an f*. 
For if f is a one-to-one mapping, then so is fj; also, f(g’ (x’)) + f(A’ (x’)) = 
Fe’ (*’) + h(x’), so that fo(e’(@’)) + f’'@’)) = fole'(@’) + AQ’), 
where h’ (x’) is also of degree not greater than n — 1, and the corresponding 
remarks hold for multiplication; finally, 44(6’) = f(b’) for every b’ € K’ 
with fo(a’) = a”. 

From this isomorphism theorem we now deduce the following unique- 
ness theorem. 


Theorem 4: uniqueness theorem for smallest splitting fields. Al// 
smallest splitting fields for an arbitrarily preassigned polynomial g(x) 
over K are isomorphic relative to K. More precisely: 


(1) If K* is a splitting field i1.w.s. of polynomial g(x) of degree n > 2 
and if g(x) has the zeros a,,...,a, in K*, then Z = K(a,,..., a,) is the 
unique smallest splitting field of g(x) contained in k*. 

(2) If Z’ = K(a,,...,a,) and Z” = K(aj,...,a,) are two smallest 
splitting fields of g(x), with g(a,) = g(a;) = 0, v = 1, ..., n, there exists 
an isomorphism f of Z’ onto Z” over K such that a? = f(a’), v = 1,..., n, 
under suitable indexing of the a), a’; thus in particular, every element 
of K remains fixed under f. 


Corollary. The factorization of g(x) into linear factors is essentially 
the same in all splitting fields (i.w.s.) of g(x); in other words: if in a splitting 
field 7’ of g(x) we have the factorization g(x) = []f_, (x — b;)*», where 
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all the b) are distinct and the k; are natural (positive) numbers, and if in the 
splitting field T” of g(x) we have the factorization g(x) = [[/_, (x — b”)*s, 
where again all the 57% are distinct, then necessarily r =f and, 
with a suitable indexing of the b7, also kj =k}, p= 1,...,r. (Here 
n=k+ee tk =k +: +k.) If T’, T" are smallest splitting 
fields, there exists an isomorphism f of T’ onto T” over K with f(b;) = by. 


Proof: Proof of conclusion (1). The field Z is contained in every 
splitting field (i.w.s.) of g(x) that is a subfield of K*. As for conclusion (2), 
let g(x) = g(x) *** 81:,(x) be the factorization of g(x) in K[x] into 
irreducible factors. Among these factors let g,,(x) be of the highest degree 
ny, <n. We index the a’ and the ay in such a way that g,,(a,) = g1, (aj) = 0. 
On account of the isomorphism between K, = K(a,) and Ky = K(a, 
over K (cf. §1.4, Theorem 3), with aj corresponding to a,, the factor- 
izations of g(x) in K,[x] and K{[x] into irreducible factors differ 
only in the notation; thus g(x) = ge1(x} a) *** Ser,(%3 a) in K;[x] and 
&(X) = Bor(%; a) *** Soe, (x5 ay) in Ky[x] (see §1.4, remark on Theorem 3). 
Again let go, (x; a;), and therefore also g»,(x; a), be a factor of the highest 
degree n,, so that n, <n — 1. Furthermore, let a, and a, be zeros of 
2a (X; @,) and gp, (x; aj), respectively. From the isomorphism of K, and K; 
over K it follows that K, = K,(a,) and Kz = K{(aj) are isomorphic 
over K, where a, and a, correspond, respectively, to a, and a,. Thus we 
can repeat the above procedure for the factorization of g(x) in K,[x] 
and K;[x], and after at most n steps we arrive at K(a,,...,a,) and 
K(a{ , ..., a), which are therefore isomorphic over K. 


Exercises 


17. Let K =JI® and let € be a zero of the binomial x? — 13. Then 
K(€) is a smallest splitting field of this binomial, and also of the 
polynomial x* — 5x + 3. (cf. ex. 1). 

18. On the analogy of ex. 17, construct other exericses from exs. 2 to 5. 

19. Let K = [1 and let a, 8 as in exs. 6 and 7. Prove that K(a, 8) is a 
smallest splitting field both for x? — 3x? + 3x — 17 and also for 
x® + 6x + 2 (cf. exs. 6, 7). 

20. Let K = IT and let € be a zero of x® — 3x + 1. Then K(6&) is already 
a splitting field. Prove that the other two zeros are &* — 2 and 
=F bo 2, 


1.5. Application to Equations of the Third Degree 

We now assume that K is of characteristic zero or p > 2, and that 
g(x) = x8 + byx*® + b,x + by € K[x], where K is not a splitting field of 
g(x). 
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Theorem 5. (1) Jfg(x) is irreducible over K, then g(x) is also irreducible 
over every extension K' of K which arises from K by adjunction of finitely 
many square roots. 


(2) If g(x) is reducible over K without being completely reducible to 
linear factors over K, then every smallest splitting field of g(x) can be 
obtained by the adjunction of a suitable square root. Here “‘square root” 
means a zero of a binomial of second degree that is irreducible over K, 
say x? ++ c with ce K. 


Proof. If g(x) is reducible over K without being completely reduc- 
ible, then g(x) contains exactly one prime factor of second degree 
q(x) = ax* + bx + c with a + 0, which by completing the square we can 
write in the form of a binomial: q(x) = a(x + (2a)-1b)? + (c — (2a)-? b? a); 
then (2a)-! exists (2a + 0, since p ~ 2). But if g(x) is irreducible over K, 
though already reducible over K, = K(a,), where a,2+ c, = 0 with 
c, € K, then g(x) = (x — (q, + 4 d,)) go(x; a,) with d, , d, € K; and thus 
the polynomial r(x) = (x — (dq, + aydz))(x — (d, — aydy)) = (x - dy)? + c,d? 
belongs to K[x], is of second degree, and has the linear factor 
x — (d, + ad.) in common with g(x), a fact which is inconsistent with 
the irreducibility of g(x) over K, since the GCD is already determined 
in K[x] by the Euclidean algorithm (I1B5, §2.9 and 1B6, §2.10). Now let K,, 
be obtained from K by successive adjunction of finitely many “square 
roots” Q@,,...,@,, where a, is a square root over K,_; = K(q,..., a,_) 
and K, = K, v = l,...,; in other words, a,? +c, = 0 with c,e K,_, 
and x* +c, is irreducible over K,_,. Then if g(x) is reducible over K,, 
(but irreducible over K,), there exists a k with 0 < k <n — 1 such that 
g(x) is irreducible over K, but reducible over X;,,, . Thus in the preceding 
argument we can replace K by K, and a, by a,,,, and finally kK, by 
Keir = Ky,(Qy41), and in this way again arrive at a contradiction. 


Examples. Problem of the duplication of the cube. The problem of con- 
structing the edge of a cube with twice the volume of a given cube is not solvable 
with ruler and compass (such a construction corresponds algebraically to the 
solution of equations of the first and second degree). For this problem leads to 
the equation a? — 2 = 0; and x? — 2 is irreducible in K[x], where K is the field 
of rational numbers (in general, if the polynomial x® + b,x? + b,x + by) has 
integral coefficients and is reducible over K, then it has an integral zero (cf. 
IBS, §4.4) which is a factor of by; but none of the numbers +1 and +2 is a zero 
of x? — 2). 

Problem of the trisection of an angle. The problem of dividing a given angle 
into three equal parts cannot be solved for every angle with ruler and compass. 
For the problem leads to an equation a? — 3a + s = 0, where s is the length 
of the chord subtending the given angle, so that 0 < s < 2. But for rational s 
the polynomial x? — 3x + sis in general irreducible over the field K for rational 
numbers, e.g., for s = 1. 
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Exercises 


21. Prove 
(a) the polynomial x + 6x + 2 is irreducible over [7 
(b) if €&, and &, are two distinct zeros, then J7(€, , £,) is a smallest 
splitting field (cf. ex. 7). 


22. Solve ex. 21 for the following polynomials: 
(a) x8 — 3x? + 3x — 17, 
(c) *— 3x41. 
Prove in case (c) that [7 &, , €,) = J7€,). (cf. ex. 13). 

23. Determine whether each of the following polynomials is irreducible 
over J], and for the reducible ones determine the smallest splitting 
field: 

x8 — 5x? + 3x+ 1, x8 — 2x+ 1, 
x8 — x? — 8x + 12, x8 — 5x? — 2x 4+ 7. 


2. Finite Extensions 


For the Galois theory, to be developed in the following sections, 
certain other properties of the smallest extension fields of polynomials 
are of essential importance. In describing these properties we begin with 
the fact (cf. §1.3, theorem 1) that the extension K(a) of K generated by 
the adjunction of a single zero a of a polynomial over K is a vector space 
over K with dimension n equal to the degree of the irreducible polynomial 
in K[x] that has a for a zero. However, not only a but every 6 € K(a) 
is a zero of an (irreducible) polynomial in K[x]; in other words, every 
b € K(a) is algebraic over (or with respect to, or relative to) the field K. 
For the fact that K(a) is a ring (and even a field) means that b! = b € K(a) 
implies b” € K(a) for v = 1, 2, ..., and also 6° = 1 e K €C K(a). Since K(a) 
is of dimension n over K, the n+ 1 elements b+, » = 0,1,...,”, are 
linearly independent over KX (cf. §1.3, Theorem 1); thus there exist elements 
c, €K, not all equal to zero, for which cgb® + cb! + --- + ¢,b" = 0. 
Thus 5 is a zero of a polynomial in K[x] of degree not greater than n 
(the case b = 0 is included). For brevity we shall say that an extension 
K’' of K is algebraic over K if every element b € K’ is algebraic over K. 
If K’ is algebraic over K, then every intermediate field T between K and Kk’, 
i.e., every field T with KC TC K’, is also algebraic over K; and K’ is 
itself algebraic over 7. Finally, we shall say that an extension E of K is 
finite over K if E is a vector space of finite dimension over K. We now 
observe that our proof of the fact that b € K(a) is algebraic over K if a is 
algebraic over K made no use of any of the properties of K(a) except 
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that it is a vector space of finite dimension over K. We thus have the 
following theorem. 


Theorem 1. Every extension E of K that is finite over K is algebraic 
over K. In particular, K(a) is algebraic over K if a is algebraic over K. 


Remark. An extension of X that is algebraic over K is not necessarily 
of finite dimension. For example, the field A of all algebraic numbers 
(over the field JZ of rational numbers) is not of finite dimension, since 
IT x] contains irreducible polynomials of arbitrarily high degree, e.g., 
®,(x) = x?-1+ +--+ x-+1, for prime p (cf. §9.2), so that A cannot 
have a basis of finitely many elements over JT, 

Examples of finite extensions are provided by the following theorem: 


Theorem 2. (1) Every extension of an arbitrary field K that can be 
generated by the adjunction of finitely many elements algebraic over K 
is finite (and therefore algebraic) over K. Conversely, every finite extension 
can be generated by the adjunction of finitely many elements algebraic 
over K. 


(2) If the arguments of a rational function over K are algebraic over K, 
then the values of the function are also algebraic over K. 


Proof. (1) Let K (a,, ...,a,) be the given extension, where the a, are 
algebraic over K, x = l,...,k. For k = 1 the assertion is true by the 
preceding theorem. We now argue by complete induction on k. Assume 
that the assertion is already proved for all k =1,....m (m > 1). 
Then it is also true for k = m+ 1, since by the induction hypothesis 
K' = K(a,, ..-) Gm) has a basis 7, ..., 7, and, if we set a,,,; = a, then for 
the case K’(a) = K’ there is nothing to prove; and otherwise the element a, 
since it is algebraic over K, is a zero of the polynomial z(x) € K[x] € K'[x] 
and is thus algebraic over K’. By the preceding theorem K” = K’‘(a) 
has a basis over K’, say 6,,...,0,. For every be K” we thus have 
b = c,6, + ++ + ¢,8, with c,eK’. Moreover, c, = dyn, + ++ + d,,n, 
with d,,¢ K, so that b = >, d,,n,0,. Here the r-t elements 7,0, are 
linearly independent over K, since b = O implies c, = 0 and thus d,, = 0. 
Since the be K” were arbitrary, the elements 7,0, form a basis for K” 
over K. 

The converse assertion follows from the fact that every finite extension, 
since it is algebraic, can be generated by the adjunction of the algebraic 
basis elements. 


(2) Since a, , ..., @, are algebraic over K, it follows that K” = K(a,, ..., Gn) 
is algebraic over K (assertion (1)) and therefore contains all values of 
rational functions as described in assertion (2). 

From the proof of assertion (1) of the preceding theorem it also follows 
that if r and ¢ are the dimensions of K’ over Kand K” over K’, respectively, 
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then K” has the dimension r-t over K. The dimension m over K of a 
finite extension E of K is also called the degree of E over K, or in symbols: 
m = degree (FE: K). Thus we have the following theorem: 


Theorem 3. (I) /fE’ or E’ is a finite extension of K or of E', respectively, 
then E” is also a finite extension of K. 


(II) Furthermore, deg (E" : K) = deg (E” : E’) - deg (E’ : K). 


1. Remark. Jf K’ is algebraic over K, and K" is algebraic over kK’, 
then K" is also algebraic over K. Proof: Every element of K” is a zero 
of a polynomial with coefficients in K’, and each of these finitely many 
coefficients is algebraic over K and thus belongs to a finite extension of K, 
so that the assertion follows from theorem 3 of §2. 


2. Remark. If K is of characteristic p = 0 or p > 2 and if degree 
(E: K) = 2, then E = K(a), where a is a zero of a polynomial of second 
degree x? + c(c eK) that is irreducible over K. 


Exercises 


24. Let K = JI and let €, and &, be two distinct zeros of x? + 6x + 2. 
Set K(é,) = K, and K,(&,) = K. Determine deg(K, : K), deg(K : K;), 
deg(K : K). Now let 7 be a zero of x? + 3 and show that 7 can be 
chosen in K, so that K+ = K(m) is thus a subfield of K. Determine 
deg(K : K+) and deg(Kt+ : K). (cf. exs. 7, 21.) 

25. Again let K = JT. The polynomial x® + 10x? + 125 has six distinct 
zeros (in a suitable extension). Let K = K(é,, &, &3, £4, 5, &4) and 
determine deg(K : K). Prove that the zeros «, and e, = e,? of the 
polynomial x? + x + 1 can be chosen in K and that we may set 
bs = 6) > = 418,, S4 = 5/8, » §5 = 64, = €,"€,. Determine 
deg(K : K(€,)), deg(K(&,) : K), deg(K : K(e,)), deg(K(e,) : K) and show 
thatK = K(€,e,). 


3. Normal Extensions 


A smallest splitting field Z of an irreducible polynomial in K[x] has 
the further property that it is normal over K, or is a normal (or Galois) 
extension of K. Here an extension Nof K is said to be normal over K if (a) N 
is algebraic over K and (b) every polynomial g(x) that is irreducible in 
K[x] can be factored completely into linear factors in N[x] provided that V 
contains (at least) one zero of g(x). The above assertion concerning Z 
is then a special case of the following theorem. 


Theorem !. Criterion for normal (not necessarily finite) extensions. 
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The extension P of K is normal over K if and only if P can be generated 
by the adjunction of all the zeros of (arbitrarily many) polynomials in K[x]. 


1. Corollary. Jn particular, every smallest splitting field of a polynomial 
in K[x] is normal over K. 


Proof. (A) Sufficiency. (a) The field P is algebraic over K; for every 
aé P is contained (cf. §1.1) in an extension K(d,,...,5,) of K, where 
the 5, are zeros of polynomials in K[x], and therefore (§2, Theorem 2) 
the element a, and with it the whole field P, is algebraic over K. 

(b) Let g(x) be irreducible in K[x] with g(a) = 0 and ae P. Then 
we must show that g(x) falls into linear factors in P[x]. But by (a) 
the field P contains elements 5,,...,5,, algebraic over K, such that 
ae K(b,,...,5,,) and such that the irreducible polynomial h,(x) € K[x] 
determined by h,(b,) = 0 falls into linear factors (x — 6,,) in P{x}, 
+= 1,...,t,;m@ = 1, ...,m. (For by the definition of P, if the 6, are in P, 
then so are all the zeros of those irreducible polynomials in K[x] for 
which the 5’, are zeros.) Now let h(x) € K[x] be the product of all the 
h,(x), and let c,, ..., Cn, be all the zeros of h(x). Since 


ae K(b,, 5 5),) © K (Cy 5 «5 Cn) CP 


we have a = r(c,,...-, Cn), Where r denotes a rational entire function of 
n arguments with coefficients in K (§1.3, Theorem 1). If we permute the 
Cy.) Cy in all n! ways, we obtain n! elements a,¢ P from r(q, ..., Cy). 
The coefficients of the polynomial f(x) = (x — a,) --- (x —a,,) are 
symmetric in the c, and thus belong to K (since h(x) € K[x]) (see IB4, §2.4), 
where we agree to set a, = a. Since g(a) = f(a) = 0 and since g(x) is 
irreducible in K[x], it follows that g(x) is a divisor of f(x), so that the 
zeros of g(x) are included among the a, and consequently belong to P; 
but this means that g(x) falls into linear factors in P[x]. 

(B) Necessity. Let K’ be a normal extension of K, so that K’ is 
algebraic over K by definition. Then every a € K’ is a zero of an irreducible 
polynomial g(x) € K[x] and all the zeros of g(x) belong to K’ (since K’ 
is normal). Thus every ae K’, and therefore the entire field Kk’ itself, 
is obtained by adjunction of all the zeros of all the q(x). 


2. Corollary. Jf K’ is a finite normal extension of K, then K’' can be 
obtained from K by the adjunction of all the zeros of a single polynomial. 

Consequence. Every finite extension E of K is contained in a normal 
extension N* of K. For if b,, ..., b, is a basis of E over K and if h,(b,) = 0 
with h,(x) € K[x] irreducible in K[x], p = 1,..., 7, let us adjoin all the 
zeros of all the polynomials h,(x). By theorem 1, §3 the extension N* of 
K obtained in this way is normal over K (and also over E), and furthermore 
ES N*: 
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Consequently we make the following convention: 


Convention. Whenever in any of the following sections we are dealing 
with a finite extension, we shall agree, even without explicit mention of 
this fact, that we are operating in a normal extension field N* of the 
given finite extension. 


Remark. In the sufficiency part §3(A) of the proof for theorem 1 the use 
of the theory of symmetric functions in (b) can be avoided if we have recourse 
instead to the theory of isomorphic mappings in §6, as follows. Let g(x) be 
irreducible in K[x] and let g(a) =0 with ae K’ = K(c,...,¢n) (see the 
proof (A), (b)). Also let there exist a 6 not belonging to K’, for which g(b) = 0. 
But K(a) and K(6) are isomorphic over K (see§6) and thus K’(a) = K(a, ¢1, ..., Cn) 
and K’(b) = K(b, ¢,,..., Cn) are also isomorphic over K. In the isomorphic 
mapping of K’(a) onto K’(b) over K the elements c,, ..., Cn are only permuted 
among themselves. Also, since a = r(c,, ..., Cn), where r is a rational function 
in mn arguments with coefficients in K, it follows that b = 7(c,,..., Cn) with 
? rational over K, so that b € K’, which contradicts the assumption. 


Exercises 


26. In exs. 19 to 25 determine which of the fields are normal extensions 
of IJ, 


4. Separable Extensions 


For what follows it is often important to know under what conditions 
we may conclude that if a polynomial is irreducible in K[x] all its zeros 
are distinct. If the zeros of an irreducible polynomial in K[x] are all 
distinct (so that the number of distinct zeros of the polynomial is equal 
to its degree), the polynomial is said to be separable over K. (Examples 
of nonseparable polynomials will be given below.) If g(x) is separable 
over K and if X’ is an arbitrary extension of K, then the factors of g(x) 
which are irreducible over K’ are likewise separable (over K’). Every 
zero of a polynomial that is separable over K is also said to be separable 
over K, and similarly every extension of K whose elements are all separable 
over K is said to be separable over K; and every a € K is said to be separable 
over K. Separable elements and extensions are algebraic (over K) by 
definition. Moreover, every finite separable extension K' of K is a simple 
extension, i.e., can be represented in the form K’ = K(a), as will be proved 
below (cf. §6, theorem 3). We also make the following remark. The 
extension K(a,,...,a,) is separable over K if (and only if) each of the 
elements Q,, +..; 4, is separable over K. (The proof, which will not be 
given here, depends on the fact that only for separable extensions is the 
number of isomorphisms of L = K(a,,...;@,) over K (cf. §6) equal to 
degree (L : K).) 
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4.1. Preliminary Remark. Computation in Integral Domains J with 
Characteristic p > 0 

(1) For every aeJ the sum a+ -+-+a=p:a=0 ff the left-hand 
side has p summands a. In particular, it follows for p = 2 that a = —a. 

(2) For arbitrary a,, ....a,€J and an arbitrary integer f > 0 we have 
(a, + ee + ay)" —_ a’ fos + a,", ifr — p’. 

Proof. In view of (1) the desired assertion follows for k = 2 and 
jf = 1 from the binomial theorem (IB4, §1.3) and the fact that the binomial 
coefficients are divisible by p (cf. IB6, §4.4 (29)). But if the assertion holds 
for k = 2 and for f, then it also holds for k = 2 and f+ 1, since for 
r’ = p’+1 we have (a, + ag)” = ((a, + aa)’)? = (ay” + ay")? = af + af. 
Finally, if the assertion holds for a given k and (arbitrary) f, then it holds 
for k + 1 and f, since 


(ay + ott + gas)” = (CQ +o + ae) + Any)” 
=(a¢ +4)’ +a), =a' +" +4, + a,,. 
Furthermore, 
(2a) - (a, — ay)’ = ay" — a,’ 


For it follows from (2) that a,7 = ((a, — ay) + a)’ = (a, — ay)? + ag”. 
(3) For ae K we have a? = a if (and only if) a belongs to the prime field 
IT of K. (Since every a € JT”) can be written as the sum of unit elements, 
it follows from (2) that a = (14+ °--4+ 1)? =1+-:-+1=a. Con- 
versely, if a? = a, then a is a zero of x? — x; but this polynomial can 
have at most p distinct zeros, and by what has just been proved the 
aeéJII‘) already account for exactly p distinct zeros. 

(4) The derivative h'(x) of a polynomial h(x) € J[x) of at least the first 
degree is the identically vanishing function if and only if J is of characteristic 
P > 0 and h(x) is a polynomial in x? with coefficients in J; in other words, 
h(x) = g(x”) with g(x) € J[x]. 


Proof. Let A(x) =a) + ax+ ++ a,x",n >1, with a,eJ and 
Qn 0. Then A’(x) =a, + 2a.x+ +--+ na,x". If p=0, then 
na, 9, so that h'(x) 40. For p > 0, on the other hand, h’'(x) = 0 
if and only if va, = 0, v = 1,...,. But this condition is automatically 
fulfilled for vy = 0 (mod p), so that a, = 0 for vy 4 0 (mod p) is a sufficient 
condition for h’(x) = 0. But then h(x) = 3°). @,x,”, from which the 
assertion follows. 


4.2. If K is of characteristic zero, then every irreducible polynomial 
is separable, as follows from the next theorem. 
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Theorem 1. Criterion for separability. (1) An irreducible polynomial 
g(x) in K[x] is separable over K if and only if either (a) the field K is of 
characteristic 0 or else (b) the field K is of characteristic p > 0 and g(x) 
cannot be written as a polynomial in x” over K. 

(II) All the zeros of an irreducible polynomial n(x) in K(x] have the same 
multiplicity. For p > 0 and nonseparable n(x) this multiplicity is a power 
q = p*, e [1, of the characteristic p of K. Moreover, n(x) = h(x‘), 
where h(y) € K[y] is irreducible and separable over K. 


Proof. By IB4, §2.2 the polynomial g(x) has at least one nonsimple 
zero a if and only if g(x) and g’(x) have a common zero and therefore 
are not coprime* (IB5, §2.9, especially (27)). Since g’(x) € K[x] and g’(x) 
is of lower degree than g(x) (in case g’(x) 4 0), and since g(x) is assumed 
to be irreducible in K[x], we must have g’(x) = 0 if g(x) has multiple 
zeros. But then the assertion (I) follows from §4.1, (4). 


We now assume that p > 0 and that, if g(x) = °_, a,x” has multiple 
roots, then g = p’, with e > 1, is the highest power of p that divides all those 
pp for which a,, + 0. Then we have g(x) = (x* — b,) -+: (x* — b,) = h(x*). 
Here the zeros },, ..., b; of h(y) are distinct, since otherwise h(y) would 
have multiple zeros and it would follow from (I) that h(y) = k(y”) and 
g(x) = m(x?*), in contradiction to the definition of g. But now if d, 
is a zero of x7 — b,, we have g(x) = [(x — d,) -:: (x — d,)]*, where all 
the d, are distinct, which proves the first part of (II). But h(y) is irreducible 
over K, since h(y) = h,(y)- he(y) implies g(x) = hy(x*) - hg(x*), which 
means that if h(y) is irreducible, then so is g(x). But then, since the zeros 
b, , ..., b; of h(y) are distinct, h(y) is separable. 


Example of a nonseparable polynomial. Wet K® be of characteristic 
p > 0, and let z be an indeterminate over K®. Then the polynomial 
x? — zis irreducible over K®(z) (for the proof see, for example, Haupt [1], 
13.3, theorem 3); on the other hand, it follows from a? — z = 0 that 
x? — Z = x? — q? = (x — a)”, So that x” — z has a zero of multiplicity p 
with p > 2. 


Exercises 


27. Let z be an indeterminate over /7®), K = JI(z). Prove that the 
polynomials x? + x + z and x!° + x54 z are both irreducible over 
K, but only the first one is separable. 


* Two polynomials are said to be coprime if they have no common factor of positive 
degree. 


7 Algebraic Extensions of a Field 425 


5. Roots of Unity 


5.1. Definition of the hth Roots of Unity 


Introductory Remarks. As the coefficient domain we take an arbitrary 
field K. Then by an Ath root of unity we mean a zero’ of the polynomial 


(1) Six) = x*— 1, 


where | is the unit element of K (/ is a natural number). 

The coefficients of f,(x) belong to the prime field [7 of K, for which 
in the case of characteristic 0 we take the field J7 of rational numbers 
and in the case of characteristic p* the field JJ‘) of residue classes mod p 
(in the ring of integers see IB5, §3.7 and IB6, §4.1). 

From the algebraic point of view it is then natural to ask the following 
important questions: 


1. To what extent can f,(x) be factored over IT (and over K)? 


2. Starting from II (or from K), how do we obtain a smallest splitting 
field for f(x)? What can be said about the structure of this field? 


In sections 5, 8, and 9 we shall obtain far-reaching answers to these 
questions. 

The roots of unity are of great importance for many problems in 
arithmetic and algebra. It is obvious that they are closely connected with 
the theory of “pure equations,” i.e., with the problem of determining 
the zeros of the polynomial 


(2) Sn,a(xX) = x* — a. 
If a 4 0,5 we have the following theorem. 


Theorem 1. Jf « is a zero® of (2), and if ¢ is an hth root of unity, then 
«+ € is also a zero of (2). Moreover, if «, and a are two (not necessarily 
distinct) zeros of (2), then a/a, is an hth root of unity. Thus we can obtain 
all the distinct zeros of (2) by multiplying any one of them with all the distinct 
hth roots of unity. 


In §5.2 we shall discuss the conditions under which the polynomial (1) 
has multiple zeros; the reader will have no difficulty in deriving the 
corresponding results for the polynomial (2). 


3 Belonging to K or to a suitable extension of K. 

4‘ In sections 5, 8, and 9 the number p is always a positive prime. 
5 The case a = 0 is of no interest. 

§ Belonging to K or to a suitable extension of K. 
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Note. It is well known that in the field of complex numbers the /Ath 
roots of unity can be represented in the following form (cf. IB8, (5)): 


2a 

3) Gene * =cosk om Spence an (k =0, lu.h — DD. 
In the Gaussian plane they are represented by the vertices of a regular 
h-gon with its center at the origin and one of its vertices at the intersection 
of the postive real axis with the unit circle. Although this representation 
is advantageous for many purposes, we shall not make use of it here 
(in sections 5, 8 and 9), even for the case of characteristic 0, but shall 
develop a purely algebraic theory of the Ath roots of unity. 


5.2. Multiplicity of the Zeros of f,(x) 


As indicated above, we shall now undertake to find out when the 
polynomial (1) has multiple zeros. As a criterion for this purpose we make 
use of the following well-known theorem (cf. §4.2, beginning of the proof). 

The polynomial f(x) in K[x] has multiple zeros’ if and only if f(x) has 
a common factor of positive degree with f'(x); in other words, f(x) has no 
multiple zero if and only if it is coprime with f' (x). 

We form the derivative 


(4) fix) = he xh, 


It is obvious that in general this derivative has no common zero with (1) 
and is therefore coprime with (1). An exception occurs only if K is of 
prime characteristic p and p is a factor of h. Before dealing with this 
exceptional case we here present the main result. 


Theorem 2. If K is of characteristic 0, then f,(x) has only simple zeros. 
The same situation holds if K is of prime characteristic p but p is not a factor 
of h. 

If either one of the hypotheses of theorem 2 is satisfied, i.e., if the 
characteristic is equal to 0 or if the positive characteristic p is not a factor 
of h, we speak of the principal case. 

Now let us turn to the exceptional case: the characteristic is p and p | h. 
Let the highest power of p that is a factor of h be p’, so that h = p!-h 
and p + h (with f > 1). Then, as we shall show in §8,8 we have 


(5) Fil) = LAO’. 


Since by theorem 2 the polynomial /;(x) has only simple zeros, it follows 
that f;,(x) has the same zeros as f(x), but each of them is of multiplicity p’. 


7In K or ina suitable extension. 
® Independently, of course, of the present section. 
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Exercises 


28. Let K = JT®) and show that in K[x] we have 
f(x) = (*&-—DY)-@+)°C? +1) and — fee(x) = [AI 


5.3. The Group ©, of hth Roots of Unity 


In this subsection we confine ourselves to the principal case, so that 
by theorem 2 f,(x) has only simple zeros. Thus if both @, and ¢, are 
hth roots of unity (possibly with ¢, = @,), then ¢,/¢, is also an Ath 
root of unity, from which it readily follows that under multiplication 
the set of Ath roots of unity forms an Abelian group of order h. We shall 
denote this group by ©, . Now let d be a (positive) factor of h. We first 
note that if (as we are now assuming) f;,(x) comes under the principal case, 
then the same is true of f;(x). Thus we can speak of the group G,. Also, 
since ¢4 = 1 implies ¢* = 1, the group ©, is a subgroup of 6, . The 
order of each element of ©, is a factor of h. An Ath root of unity ¢ is 
called a primitive hth root of unity if it is of order h or, in other words, 
if no positive integer g < h exists such that 7 = 1. We shall denote the 
number of primitive Ath roots of unity by %(h).® Then, since the order 
of each element ©, is a factor of h and every primitive dth root of unity 
(with d| h) is an Ath root of unity, we have 


(6) 2 Yd) = h. 


We now show that this result implies 


(7) o(h) = ph), 
where ¢(h) is the Euler function (see IB6, §4.2). The proof of (7) is by 
complete induction, which we first carry through for characteristic zero. 


I. It is obvious that there is exactly one primitive first root of unity, 
namely 1. Thus %(1) = (1) = 1. 

II. Now assume that A > 1 and (7) is true for all smaller numbers. 
Then it follows from (6) that 


(8) Why=h—- YY ¥d@)=h— J od). 
djh,dth d|h.d#h 
But the Euler function (see 1B6, §5, theorem 7) satisfies the well-known 
equality, corresponding to (6), 
(9) vd) = h, 
jh 


®* It will turn out that this number (for the principal case) does not depend on K and 
is thus independent, in particular, of the characteristic of K. 
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from which we obtain 


(10) Hhy=h- YL od), 


d|hdsth 


so that the desired assertion follows directly from a comparison of (8) 
and (10). 

It is easy to see that this method of proof is also valid for a prime 
characteristic p, provided (principal case) that h is not divisible!® by p. 

Thus (7) is completely proved for the principal case. From (7) it follows 
in particular that %(h) > 0, so that Gy, is a cyclic group (see IB2, §5) of 
order h. Thus the structure of the group ©, is completely determined, 
and we have the following theorem. 


Theorem 3. The multiplicative group G, of the hth roots of unity is a 
cyclic group of order h. Thus it is isomorphic to the additive group (the 
module) of residue classes of the ring of integers mod h. 


Exercises 


29. Let K be of characteristic 42. Prove that there exist two primitive 
fourth roots of unity and that they are the zeros of the polynomial 
x* + 1. Construct G,. 


30. Let K be of characteristic #2, 3. Prove that there exist four 
primitive twelfth roots of unity and that they are the zeros of the 
polynomial xt — x* + 1. Construct ©. . Prove that if € is a primitive 
twelfth root of unity, then &? is a primitive sixth roots and & is a 
primitive fourth root. 


5.4. The Cyclotomic Polynomial ®, (x). 


In this subsection we again restrict ourselves to the principal case. 
We now introduce the polynomial, called a cyclotomic polynomial, 
whose zeros are the primitive Ath roots of unity (each with multiplicity 1) 
and whose leading coefficient is 1. We shall denote this polynomial by 
®, (x) or also, in order to emphasize its dependence on the characteristic 
of the field K, by ®{ (x) or Bi” (x). We now prove the following theorem. 


Theorem 4. The coefficients of ®,(x) belong to the prime field IT of K. 


We first note that the argument leading to (6) indicates the following 
connection between the polynomials f,(x) and the cyclotomic polynomials: 


(11) I] Pax) = fal). 


dlr 


10 If h is divisible by p, we can set #(A) = 0. 
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It is easy to see that each of the factors in the product on the left-hand 
side of (11) is a factor of f,(x), that any two of these factors are coprime, 
and finally that every zero of f,(x) is a zero of one of these factors. 

The proof of theorem 4 now follows by complete induction, where 
again in the case of characteristic p we must restrict ourselves to integers h 
that are not divisible by p.1! 


I. We have ®,(x) = f,(x) = x — 1, so that the coefficients of ®,(x) 
belong to JT. 


II. Now assume that h > 1 and that the assertion is true for all smaller 
numbers. For characteristic p we again restrict ourselves to the case p ¢ h. 
From (11) we have 


(12) D,(x) - Il D(x) = fr(x). 
dh 


ath 


For abbreviation, we set the second factor on the left-hand side of (12) 
equal P,(x), so that (12) becomes 


(13) ®, (x) + Pa(x) = fr). 


From the induction hypothesis it is easy to see that all the coefficients 
of P,(x) belong to JT and that the same is true for f,(x), and consequently 
(for example, by the division algorithm) the desired statement is also 
true for ®,(x). Thus the proof of theorem 4 is complete. 

For characteristic 0 this theorem can be sharpened as follows. 


Theorem 5. Jf K is a field of characteristic 0, then ®,(x) has integral 
coefficients. 


It is only necessary to repeat the steps of the above proof and to note 
that in the application of the division algorithm the coefficient of the 
highest power of x in P,(x) is 1. 

We now investigate the relationship between ®{"(x) and D{?)(x). 
Of fundamental importance here is the fact that there exists exactly one 
homomorphism H‘?) of the ring of integers G onto IJ). The homo- 
morphism H'”) is obtained by setting the integer g in correspondence 
with the residue class mod p determined by g. It is easy to show that (with 
respect to addition and multiplication) this correspondence is a homo- 
morphism (cf. I[B5, §3.7, and IB6, §4.1) and that no other homomorphism 
can exist (the image of the number 1, which must be the unit element 
of II‘), already determines the image of every integer). We now have 
the following theorem. 


11 For A divisible by p we can set ®,(x) = 1.- 
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Theorem 6. Jf the homomorphism H'”) is applied to the coefficients of 
® (x), the polynomial B® (x) becomes Pi”) (x). 


For the proof it is only necessary to repeat the steps of the proof of 
theorem 4. 


5.5. Computation of the Cyclotomic Polynomials. We note that this proof, 
in particular the steps (12) and (13), actually enables us to calculate the 
cyclotomic polynomials. As an example we calculate the case # = 12, where 
we must avoid the characteristics 2 and 3. Since 12 has the factors 1, 2, 3, 4, 6, 12, 
the individual steps are as follows (we leave the multiplication and division 
to the reader). 


lL @(%) =hA(xs) =x - 1 
I. P(x) = @(xs) =x-—t; (x) =x4+1 
Il. P(x) = (x) =x—1; @ (x) = x?+x4+1 
IV. P(x) = (x) > Ox) = x -—1; Ox) = x? 4+ 1 
Vv. Pex) = O,(x) - (x) - 8(x) = x4 + x3 —x—-—1; G(x) = x? -—x4+1 


VIL Pyo(x) = By(x) + D(x) + B(x) + B(x) * B(x) = x8 + x® — x? — 1; 
(x) = x4 ~ x? + 1 


These relations hold for all characteristics other than 2 and 3. The reader 
may illustrate theorem 6 for the case # = 12, p = 7. 

For characteristic 2 every twelfth root of unity is also a third root, and for 
characteristic 3 every twelfth root is also a fourth root, so that in the first case 
we may use formulas I and III, and in the second, I, II, and IV. 

The following representation, in terms of the MGbius function »(7) (cf. IB6, §5) 
also holds for the principal case: 


(3) 
(14) (x) =] LAGor” 


a|h 


but we shall make no further use of it. For comparison we calculate %,,(x) 
from (14). Since [u(1) = #(6) = 1, #(2) = »(3) = —1, and e(4) = u(12) = 0, 
we have 
(x) = (x? — 1) - ? — 1): [@* — 1) + G* — 1) 
= (x4 — x — x? 4+ 1): (xl? — x? — x4 4+ 1) 
= x'— x?4 1. 


Exercises 


31. Let K be of characteristic 42, +3. Determine ®,,(x). How is ®,,(x) 
related to ®,,(x) and ®,(x)? (cf. ex. 30.) Prove that 


_ G4 — 1-4-1) 
Bul) = GET) FT) 


7 Algebraic Extensions of a Field 431 


5.6. Concluding Remark 


We return to the two questions raised in §5.1. Let ¢ be a primitive Ath 
root of unity (in a suitable extension of K); then K(€) is obviously the 
smallest splitting field not only of the polynomial f,(x) but also of D,(x). 
Thus if ¢, and @, are two primitive Ath roots of unity, then K(¢,) = K({,). 
As a result, ®,(x) is either irreducible over K or splits into irreducible 
polynomials, all of which are of the same degree. In §8 we deal with the 
case that K is a finite field. In §9 it will be shown that ®, (x) is irreducible 
over the field J7 of the rational numbers. 


6. Isomorphic Mappings of Separable Finite Extensions 


The Galois theory of separable polynomials is based on a study of the 
isomorphic mappings J relative to K (or over K) of a separable finite 
extension E of K in a separable normal extension field N* of K (cf. the 
end of §3, convention). Thus J leaves every element of K fixed, while 
every element of E is mapped onto an element of N*; so we may write J 
more explicitly in the form J(E: K; N*) or J(E: K). The images a’ of 
an element ae £ under the mappings J(E: K) are called the conjugates 
of a with respect to E over K; or in symbols, a’ = conj(a; E: K). By the 
number.of conj(a; E : K) we mean the number of mappings J(E : K), even 
if the conj(a; E : K) are not all distinct (Compare the following examples). 


Examples. (1) If £ = K(a) is separable over K, it follows from §1.4, 
theorem 3, that all the mappings J(EZ: K) are obtained by replacing a with 
each of the zeros (in N*) of the irreducible (in K[x]) polynomial g(x) for 
which g(a) = 0. The number » of the J(E: K) is thus equal to degree (E: K), 
or to the maximal number n for which the a’, ..., a”~! are linearly independent 
over K, or finally to the degree » of g(x). For since E is separable over K, the 
number 7 is equal to the number of zeros of g(x), in view of the fact that these 
zeros, and thus also the conj(a; E: K), are all distinct. But if, for example, 
be K, then the conj(6; E: K) are all equal, since they are all equal to b. 

(2) Now let K’ = K(a), or E = K” = K’(b), be separable over K or over K’ 
respectively, where n’ = degree(K’: K)>2 and n” = degree(K”: K)> 2. 
The number of the J(E: K) is again equal to degree(E: K) =n’ +n” =n 
(cf. §2, theorem 3). But among the conj(a; E: K) there are only n’ distinct 
elements; for under each of the mappings J(EZ: K) the polynomial g(x) with 
g(a) = 0 (cf. example (1)) is mapped onto itself, so that a is mapped onto 
one of the n’ zeros of g(x), or in other words onto one of the conj(a; E: K). 
Thus the conj(a; E: K), the number of which is equal to n = n’ +n”, fall into 
n’ classes of n” equal elements each. 


From the isomorphism theorem (§1.4, theorem 3), we obtain the 
following generalization of the results in example (2). 


Theorem 1. Hypothesis. The element a,,, is algebraic over 
K, = K(a@,..., a), v = 0,..., 42 — 1, a9 € K, Ky = K. Moreover, a, has 
exactly k, distinct conj(a, ; K,: K,1), v = 1,..., 7. 
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Conclusion. (1) There exist exactly k = k, ++: k, isomorphic mappings 
J(K, : K; N*).2 Thus for every b € K,, the number of (not always distinct) 
conj(b; K, : K) is equal to k. 


Conclusion. (2) If K, is separable over K, then k = degree(K, : K,_,), 
so that k = degree(K,, : K). 

If E is an extension of K with E = K(a), then a is said to be a primitive 
element of E (over K). We then have the following theorem. 


Theorem 2. Jf E is a finite separable extension of K, then be E is a 
primitive element of E over K if and only if all the conj(b; E : K) are distinct. 


Proof. Necessity. If 6 is primitive, the assertion follows from the 
preceding theorem or from example (1). 


Sufficiency. If the conj(b; E: K) are all distinct, then the number 7 of 
them is equal to degree(E: K) (compare the preceding theorem, con- 
clusion (2)). On the other hand, the number of distinct conj(b; E: K) 
is not greater than the degree k of the irreducible (in K[x]) polynomial g(x) 
for which g(b) = 0. Thus n < k. But n = degree(E: K) is equal to the 
dimension of E over K. Thus k < n = degree(E : K), from which k = n. 
Thus 5°, ..., b"-1 is a basis of E, so that E = K(b), as was to be proved. 

Furthermore, we have the following important theorem. 


Theorem 3. Every separable finite extension E of K has primitive 
elements and can thus be represented as a simple extension E = K(b) of K. 


Proof. (A) If X is a finite field, then E is also finite. Thus the assertion 
follows from §8.2. 

(B) If K contains infinitely many elements, then by §2, theorem 2, 
we have E = K(a,,...,a,) for (finitely many) suitably chosen elements 
Q, ++, 4, €E algebraic over K. For n = 1 the assertion is true with 
b = a,. Arguing by complete induction on nm, we now assume that 
the assertion is true for m = 1,...,k, and that E = K(q,,..., Ay, A41) 
or E= E,(c) for E, = K(a,,...,a,) and c = a,,,. By the induction 
assumption we may set E, = K(b), so that it only remains to prove the 
assertion for K(b, c). If b; = 6, ..., b,, orc, = ¢, ..., ¢; , form the complete 
set of conj(b; K(b) : K) or conj(c; K(c): K) respectively, then (because 
of the separability of E over K) all the 5,(i = 1,..., r are distinct from 
one another, and similarly all the c,;(j = 1,...,¢ (compare theorem I, 
conclusion (2)). If we set a,, = b, + de, with de K, and a = a, then 
the conj(a; E: K) are included among the a,,, since every J(E: K) 
maps 6 or c onto a b, or c, , respectively, and maps d onto itself; further- 
more, to every J(E: K) there corresponds exactly one p and exactly 
one 7 (since the conj(b; K(b) : K) and also the conj(c; K(c) : K) are all 


12 In this conclusion (1) it is not necessary for N* to be separable. 
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distinct), so that the mappings J(E : K) correspond in an one-to-one way 
with certain pairs (, 7). Thus if there exists a de K for which a,, ~ a,,, 
provided p ~ yp or 7 + », then all the conj(a; E: K) are distinct, and 
the assertion follows from theorem 2. The existence of such ade K 
follows from the fact that on the one hand the finitely many equations 
Qy, = @,, in the unknown d for all p, 7, wu, v with p ~ w or 7 ~ v are at 
most of first degree in d and thus each of them has at most one solution, 
and on the other hand K is infinite. 


Corollary. If K” is a finite separable extension of K, and if K’ is an 
extension field of K contained in K”, or, in other words, if K is a so-called 
intermediate field, then there exist a’, a” € K” such that K’ = K(a’) and 
kK" = K'(a"); since K’ and K” are finite and separable over K and Kk’ 
respectively. 

As an extension of this result we have the following theorem. 


Theorem 4. Hypothesis. Let E be a finite separable extension of K, 
and for ac E assume that the set of conj(a; E: K) contains r distinct 
elements. 


Conclusion (1). It follows that r = degree(K(a):K) and r-t= 
degree(E : K), where t = degree(E : K(a)). 

Conclusion (2). Moreover, aé K if and only if all the conj(a; (E: K) 
are equal. 


Proof. For (1). This conclusion follows from theorem 1. For (2). 
Necessity. The necessity is clear. Sufficiency. By conclusion (1) we have 
r = degree(K(a) : K) = 1, from which it follows that a eé K. 


Exercises 


32. Let K = IT, Let &, be a zero of x® + 10x? + 125 and e, a zero of 
x?4+x-+ 1 (cf. ex. 25). Set Kt = K(e,), K = K(e,, £,), K, = K(E,) and 
determine deg(K*+: K), deg(K : K),deg(K : K+), deg(K,: K), deg(K : K,). 
In each case, determine the relative isomorphisms, i.e. the J(Kt : K), 
J(K : K), J(K : K+), J(K,: K), J(K : K,), and the number of these 
isomorphisms. What are the images a) of £,, b) of e, under the 
isomorphisms J(K : K). Prove that K = K(e, + &,) so that e, + &, 
is a primitive element of K (over K). 


7. Normal Fields and the Automorphism Group 
(Galois Group) 


Summary of the argument in §7. In the introduction to the present 
chapter we have seen that the “solution of an algebraic equation”’ corre- 
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sponds to the construction of a smallest splitting field NV for the polynomial 
over K that defines the equation. The field N is obtained by adjunction 
of finitely many a, , « = 1,..., k, such that 


KC K(a) CC K(a, .., a) = N. 


Now instead of asking for such elements a, it is obvious that we can 
also seek an increasing chain of (suitable) intermediate fields Z, , for 
example Z, = K(ao,...,a,), With degree (Z,.: Z,_,;) > 1 (Z) = K). The 
decisive feature of this change in the form of what we are seeking is the 
fact that (cf. theorem 2) the total number of all the intermediate fields 
between N and K is finite (whereas, in general, there are infinitely many 
possibilities for the choice of the elements a, , ..., a, € N — K). The same 
theorem (theorem 2 below) also states that under the isomorphic mappings 
J(N : K) the field N is mapped onto itself and the entire set of mappings 
J(N: K) constitute a finite group G(N: K) of order degree(N : K) 
Moreover, the intermediate fields between N and K and the subgroups 
of G(N: K) are in one-to-one correspondence with each other in such 
a way that a larger intermediate field corresponds to a smaller subgroup. 
Finally (cf. theorem 3), conjugate subgroups correspond to conjugate 
intermediate fields (i.e., to fields that are mapped onto each other by 
one of the J(V: K)); thus, in particular, the normal subgroups of the 
group correspond to intermediate fields that are normal over K. The 
increasing chains of intermediate fields Z, that provide the desired solution 
are then seen to be chains for which the corresponding subgroups U, 
are “maximal” relative normal subgroups, i.e., such that U,CU,_, 
and U,, is a “maximal”? normal subgroup of U,_, , so that Z, is thus a 
“minimal”? normal field over (relative to) Z,_,. The fact that U, is a 
maximal normal subgroup of U,_, and thus that Z, is a minimal normal 
field over Z,_, has the following significance: the chain of subgroups, 
and of corresponding intermediate fields, cannot be refined by the insertion 
of additional relative normal subgroups and relative normal intermediate 
fields. It may be said that such a chain provides all the successive adjunc- 
tions that are indispensable for the solution of the problem. 


Details of the argument. We now proceed to the detailed proofs. 
Again we consider only finite separable normal extensions N of an 
(arbitrary) field K, so that we may set N* = N (cf. end of §3, convention). 
In view of the existence of a primitive element a of N over K, we set 
N = K(a). We first prove the following theorem. 


Theorem 1. (1) The isomorphisms J(N: K) of N relative to K map 
N onto itself and are thus automorphisms of N over K. 


(2) These automorphisms form a finite group of ordern = degree(N : K), 
the Galois group G(N : K) of N over K. 
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Proof. (I) Assertion (1). Since N is normal over K, all the conj(a; NV : K) 
lie in N and (for primitive a) are distinct, and the number of them is 
n = degree(V: K). We now set J = J(N: K) and let N’ = J(N) and 
a’ = J(a) be the images of N and a respectively under J. Since J is an 
isomorphism, it follows that N’ = K(a’) and N’ is normal over K; thus, 
in view of the fact that a’ € N, it follows that N’ C N. Since we also have 
NCN’, it follows that N = N’. 


(iI) Assertion (2). By (I) the number of mappings J(N: K) is 
n = degree(N : K). Furthermore, the converse mapping of a J(N: K) 
is again a J(N: K), and the identical mapping, and also the product of 
two J(N: K) (ie., the result of their successive application) is again a 
J(N : K), as was to be proved. 


We now turn to the correspondence between the subgroups of G(N : K) 
and the intermediate fields Z between K and N. For abbreviation, the 
system of all those automorphisms J(N : K) under which a Z remains 
elementwise fixed will be denoted by U(Z), so that U(Z) is a subgroup 
of G(N : K), and to every Z there corresponds a unique U(Z). Similarly, 
the system of all those elements of N that remain fixed under all the 
automorphisms of a subgroup V of G(N: K) will be denoted by 7(V), 
so that T(V) is an intermediate field and to every V there corresponds 
a unique 7(V). Then U(Z) produces a unique correspondence between 
the intermediate fields Z and certain subgroups U(Z), and similarly T(V) 
produces a unique correspondence between the subgroups V and certain 
intermediate fields T(V). We now show that this set of subgroups U(Z) 
contains all the subgroups, and similarly the set of intermediate fields 
T(V) contains al/ the intermediate fields. In fact, we have the following 
fundamental theorem. 


Theorem 2. Hypothesis. Let N be a finite separable normal extension 
of K. 


Conclusion. The correspondence U = U(Z) assigns, in a one-to-one way, 
to every intermediate field Z a subgroup U, and similarly T = T(V) 
assigns, in a one-to-one way, to every subgroup V (of G(N : K)) an inter- 
mediate field T. These two correspondences are inverse to each other: 
Z = T(U(Z)) and V = U(T(V)) for all Z and all V. Thus the set of 
intermediate fields is mapped in a one-to-one way onto the set of subgroups. 
To every subgroup U there corresponds the greatest intermediate field Z 
that remains elementwise fixed under the automorphisms of U(U = U(Z)) 
and V is the greatest subgroup whose automorphisms leave T(V) elementwise 
fixed. Thus V is the Galois group of N over T(V); that is, V = G(N: T(V)). 
The group G(N : K) is finite, and thus N contains only finitely many fields 
between K and N. 
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Proof. For Z = T(U(Z)) with arbitrary Z. By definition, U(Z) is 
the (greatest) subgroup of G(N: XK) that leaves Z elementwise fixed. 
Since N is also normal over Z, the group U(Z) must be the Galois group 
G(N:Z). But G(N:Z) leaves exactly the elements of Z fixed (§6, 
theorem 4). 


For V=U(T(V)) with arbitrary V. In any case we have 
VCG(N: T(V)) = U(T(Y)). Thus if we denote by v the order of V 
and by g the order of G(N': T(V)), then v < g. But now we shall also 
prove that g < v. To this end we set degree(N : K) = k and N = K(a). 
The k elements conj(a; N: K) are all distinct; let us denote them by 
a, = a,..., a,,80 that N = K(a,), « = 1,..., k. (Compare §6, theorem 2.) 
If J,,...,J, are the automorphisms of V, then (with a suitable 
enumeration of the a,) we have a, = J,(a,), r= 1,..., v <k. But the 
a, are only interchanged among themselves by the mappings J,, since 
J,(a,) = J,J,(a,)) = JJ,)(@) and J,J,¢ V, in view of the fact that V 
is a group. Thus the coefficients of p(x) = (x — a,)‘*:(x — a,) are 
invariant under the mappings J,¢ V and consequently belong to T(V), 
so that p(x) € T(V)[x]. Since v is the degree of p(x), it follows that 
degree(K(a) : T(V)) = degree(N : T(V)) <v. But by theorem 1 we have 
g = degree(N : T(V)), so that g < v. Consequently g = vand VC U(T(V)) 
implies V = U(T(V)). Thus the U(Z) and T(V) each give rise to a 
one-to-one mapping of the set of all the Z onto the set of all the V, as 
follows from the uniqueness of the U(Z) and T(V) and from the fact 
that Z = T(U(Z)) and V = U(T(V)). 

In continuation of the outline at the beginning of the present section, 
we now proceed as follows: we show that the one-to-one correspondence 
just proved between the subgroups and the intermediate fields allows 
us to deduce the structure of the normal field NV over K from the structure 
of the Galois group. For this purpose we first introduce some terminology: 
by the intersection of given subgroups or given intermediate fields we 
mean the greatest subgroup, or the greatest intermediate field, that is 
contained in all the given groups, or fields; by the union (compositum) 
of a given set of groups or of intermediate fields we mean the smallest 
subgroup, or the smallest intermediate field, in which all the given sub- 
groups, or fields, are contained. (We note that the intersection is at the 
same time the set of all mappings, or field elements respectively, that 
belong to every one of the given subgroups, or to every one of the given 
intermediate fields. We now have the following theorem. 


Theorem 3. (1) Jn the correspondence (given by the preceding theorem) 
V = U(Z) or Z = Z(V) the intersection, or the union, of a set of subgroups 
corresponds to the union or the intersection, respectively, of the corresponding 
intermediate fields, and conversely. 
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(2) The intermediate field Z' is isomorphic over K to the intermediate 
field Z" if and only if U(Z’') is conjugate in G(N : K) to U(Z"). 

(2a) Thus in particular every normal subgroup of G(N : K) is the image 
of an intermediate field that is normal over K, and conversely. 

In accordance with (2), we say that two intermediate fields Z’, Z” 
isomorphic over K are conjugate in N over (with respect to) K; or in 
symbols, Z” = conj(Z’; N: K). 

Proof. For (1). We denote the compositum and the intersection of 
the subgroups V’, V” by V’ v V’ and V’ a V’", respectively (in IB2 we 
wrote <V’U V") and V’ V") and correspondingly for the intermediate 
fields. From the definition of V’ = U(Z'), V” = U(Z") we have the 
result: to Z = Z’ v Z" the correspondence V = U(Z) assigns the greatest 
subgroup V under which both Z’ and Z” remain elementwise fixed, so 
that VC V’ a V”, On the other hand, both Z’ and Z” remain elementwise 
fixed under every subgroup contained in V’ a V", so that V = V’ a Vv", 
In the same way, Z’ a Z” = T(V'v V’), 

For (2). The subgroups V’, V” are conjugate (in G) if and only if 
there exists a JEG with V” = J“V'J, and thus V’ = JV"J-!, where 
the symbol J—1J'J with J’e V’ means that J’ is to be carried out after 
J and J~ after J’, or in other words that the operations are to be read 
from right to left. 

We now let V’ = U(Z’), V" = U(Z2") and Z' = J(Z), V = U(Z) so 
that, for example, Z” = T(V"), Z = J-1(Z’). Then we must show that 
Z = Z". The proof proceeds as follows. Let J’ < V’ be arbitrary. Under J 
the field Z is mapped onto Z’; under J’ the field Z’ remains fixed and 
under J~! it is mapped back onto Z, in such a way that Z remains element- 
wise fixed under J—1J’J. Thus J-1V’J C V. If we now interchange J with 
J-! and correspondingly Z’ with Z and V’ with V, we likewise have 
JVJ“ CV’, so that VO J>V’J. Thus V = J-'V'J and thus V’ = JV, 
and therefore Z = T(V) = T(V") = Z". Conversely, it now follows 
from the one-to-oneness of V = U(Z)thatif V” = U(Z")and V’ = U(Z’), 
and also Z’ = J(Z”), then V" = J1V’'J. 

Remark. Under the operations of formation of the compositum and 
the intersection, the system v of all the subgroups, and similarly the 
system z of all ‘the intermediate fields, becomes a lattice (not necessarily 
distributive). Then under the correspondence V = U(Z) it follows from 
the first conclusion of the above theorem that v and z correspond dually 
to each other (cf. IB9, §1). 

From the above results we now have the following two theorems. 


Theorem 4. Let Z’C N be normal over Z", and write G’ = G(N: Z’) 


and G" = G(N: Z’). Then G’ is a normal subgroup a G" and the factor 
group G"/G’' is isomorphic to G(Z' : Z"). 
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For we see that the residue classes of G” with respect to G’ consist 
of exactly those J¢G” which generate the same automorphism 4 of Z’ 
over Z”, and the product of two residue classes corresponds to the product 
of the corresponding A. 


Theorem 5. Every composition series of G(N:K) (cf.1B2, §12) 
corresponds to a “‘composition series” of N over K, namely to an increasing 
sequence of intermediate fields Zy = KC Z,C++-C Z, = N in such a 
way that Z,,is a smallest (in N) normal field over Z,_,, « = 1, ...,k, and 
conversely. All these composition series of N have the same length, and the 
groups G(Z,,: Z,_,) are simple (they have no proper normal subgroups). 


Exercises 


33. Let K, K,,K, Kt, &,, &, 71 beasinex. 24. Also let & be the third zero 
of x3 + 6x + 2 and set K(é,) = K, and K(&) = K,. Prove that 
K+ and K are normal over K. Consider the Galois groups G(K : K), 
G(K : K+) and G(K+: K). Prove that G(K : K) is isomorphic to S, 
(the symmetric group on three elements). Determine the fields into 
which K, is taken by the automorphisms in G(K : K). 

34, Let K, K+ and K be as in ex. 32, and (for 1 <i <6) set K(é,) = K; 
(the €; as in ex. 25). Show that K+ and K are normal over K. Inves- 
tigate the structure of G(K : K), G(K : K+) and G(K+: K). 

Prove 

(a) G(K : K*) is isomorphic to S,, 

(6) the zeros 4; , 42, 73 of the polynomial x? — 15x + 10 can be 
chosen in K, 

(c) the binomial x? — 3 is also reducible over K. 

35. In exs. 33, 34 determine 

(a) the subgroups of G(K : K), 

(b) the intermediate fields in K over K 

(c) the correspondence between the subgroups and the intermediate 
fields. 

36. In exs. 33, 34 construct a composition series in each case (in ex. 34 
there are several possibilities) and determine the corresponding 
sequence of intermediate fields. 


8. Finite Fields 


8.1. Preliminary Remark. Simple Relations in Fields of Characteristic p 

Some remarks on the importance and the historical development of 
finite fields are to be found at the end of the present section. The discussion 
in the section itself is given in modern form, corresponding to the general 
contents of the present volume on algebra. 
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Every finite field has a positive characteristic, so that we are now in a 
position to assemble some relations that hold in fields in characteristic p.18 

In the first place, it is well known that in integral domains (in particular, 
in fields) of characteristic p we have 


(1) (a + 6)? = a? + BP, 
By repeatedly raising this equation to the pth power we readily obtain 
(2) a+b” =a"%+b” (f= 1,2,3,...). 


Setting a= c—d and b =d in (2) gives (c — d)” = c” —d”, or, 
changing the letters, 


(3) (a — b)” = a” — br, 
From (2) we have 

at+b+c”=(@+ by” +c? = ar’ + b+ cv. 
By complete induction on n it follows that 
4) © @+a,t++4,)" = a t+ a + +a”, 
By II‘) we again denote the prime field of characteristic p. Then 
(5) a? = 4, if aell™®, 
since for integral n > 0 we have 

nS l+titeort )? Ss 124 124-4 12 =n (mod p). 
Now let g(x) be a polynomial over I], say 
2(X) = By x™ + by yx™ + ++ + bo, 


where all the b, belong to [7). Then from (5) and from the application 
of (4) for f = 1 to the integral domain J7‘?)[x] we have 


Le (x)]? = Dy xXP™ + Dy xP O-Y + o + by 
a bin xPym + Dm—-1 0 ae a a sag bo = g(x?), 
and thus 


(6) (g(x)]? = g(x?), if g(x) is a polynomial over IT, 


8 Cf. also §4.1. 
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If we apply (3) to the integra! domain J7)[x}, we obtain 
(xk — 1)” = xh’ — 1, 


which is exactly the equality used in §5 and denoted there by (5). 


Exercises 


37. Give the decomposition over JT of: 
(a) x®® — 1, (6) x7” — 1. 

38. Decompose the following polynomials into linear factors over 
IT); (a) x? — x, (b) x?-t — 1, (c) x? — xP", (d) xen” — 1, 


8.2. Fundamental Theorems on Finite Fields 


By a finite field we mean a field that contains only finitely many elements. 
The number of elements will be denoted by g, where we assume g > 1. 
The finite field with g elements will be denoted by F,!4 and the charac- 
teristic, which must of course be positive, will be denoted by p. Obviously 
F, is a finite’ and therefore algebraic!® extension of II). We now prove 
the following theorem. 


Theorem 1. The field F, is separable* over II‘), and therefore separable 
over every intermediate field Z. 


For if a is an element of F, and h(x) is the irreducible polynomial in 
IT [x] for which « is a zero, then if h(x) were inseparable it could be 
represented in the form h(x) = g(x”), where g(x) is a polynomial in 
IT [x], and it would follow from (6) that h(x) = [g(x)]?, so that h(x) 
would be reducible, which completes the proof of the theorem. 


Theorem 2. The field F, can be represented as a simple extension of 
the prime field II‘), and therefore of any intermediate field Z. The multi- 
plicative group of the field F, is cyclic. 


For the proof we consider the multiplicative group of the field F,, 
consisting of all the nonzero elements of F,, . Since its order is s = gq — 1. 
for every nonzero element é of F, we have 


(7) &&#—1=0. 


14 The notation is reasonable, since we shall see that the structure of the field depends 
only on q. The letter F is used to suggest the word “‘field.” In the literature a finite field 
is often called a “Galois field’ and is denoted by GF(qg) instead of F, . 

18 Cf, §2. 

16 Cf. §4. 
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The polynomial?’ 
(8) f(x) = x*— 1 


thus has exactly s distinct zeros in F, . But these zeros are precisely the s 
roots of unity, and since they are distinct from one another it follows, 
as was proved in §5, that s is coprime to p and that (at least) one primitive 
sth root of unity ¢ exists. Consequently F, = IT‘”)({), since the element 0 
already belongs to JI‘) and every nonzero element & can be represented 
in the form ¢*. Then it follows easily that F, = Z(¢), which completes 
the proof not only of theorem 2 but also of the fact that every primitive 
sth root of unity ¢ is a generating element of F,. 


Theorem 3. There exists a natural number v such that 
(9) q =p’. 


In other words: the number q of elements of a finite field is a power of the 
characteristic. 


For we again let { be a primitive sth root of unity and let the corre- 
sponding irreducible polynomial in J7‘”)[x] have the degree v; then every « 
in F, can be represented uniquely in the form 


up) = gh alt + cigl? +o + ay lO, 


where the coefficients belong to /7‘?). Conversely, every such expression 
with coefficients in JZ is obviously an element of F, . Since there are p 
possible values for each of the coefficients, the result (9) follows at once. 


Exercises 


39. Investigate the polynomial x® — x corresponding to F,[x]. Show: 
a) it falls into linear factors in F,[x]; b) its zeros (except for zero) are 
the eighth roots of unity; c) it falls into linear and quadratic factors 
in F,[x]; d) two of its quadratic factors (let them be denoted by g(x) 
and h(x)) have the primitive eighth roots of unity for their zeros; 
e) if £, is a zero of g(x) (in F,), then €,3 is the other zero; also, €,5 and 
£,’ are the zeros of h(x). 


40. (Cf. ex. 39). Every element of F, can be represented in the form 
aé, + b, where a and b are elements of F, . Set up the multiplication 
table. E.g., what is (€, + 2) - (2, + 1)? (Hint. Use one of the 
polynomials g(x) and h(x) in ex. 39.) 


” Cf. §5(1). 
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41. (a) Determine the polynomials of third degree in F,[x] which are 
irreducible over F, (the coefficient of x* is always to be taken equal 
to unity). (Hint. There are eight of them; every polynomial of 
third degree in F,[x] which is reducible over F, has a zero in F,. 

(b) Compute ©) and ®{? and factor these polynomials into their 
irreducible (over F,) factors. (Hint. The irreducible factors are 
the irreducible polynomials determined in a)). 

(c) Factor the polynomial x?’ — x into its irreducible (over F;) factors. 

(d) Investigate the structure of F,, with the aid of one of the irreducible 
polynomials determined in a). 


8.3. Existence and Uniqueness of Fv for Arbitrary p and Arbitrary v 


Theorem 4. Let there be given an arbitrary prime p and an arbitrary 
natural number v. Then there exists (at least) one finite field with p’ elements. 
All finite fields with the same number of elements are isomorphic to one 
another. 

We set q = p”’. The proof depends on the remark that if the field F, 
exists, its nonzero elements are exactly the zeros of x¢-! — 1, from which 
it follows at once that the polynomial x* — x has all the elements of F, 
for its zeros, and only these. Thus we have proved (for the time being 
under the hypothesis that the field F, exists) that: 


A. The polynomial 
(11) &q(x) = x*—-x (Gq =p’) 
has only distinct zeros, namely all the elements of the field F,. 


B. The field F, is the smallest splitting field of gq(x). 


We now discard the hypothesis that the field F, exists, since we wish 
to prove its existence. Since g(x) = —1, the polynomial g,(x) has no zero 
in common with its derivative, and thus g,(x) has only distinct zeros. 
We now show that the zeros of g,(x) already form a field. For let « and 8 
be two zeros of g’,(x), so that «a = aand B* = Bf. Then from (2) and (3) 
it follows that (a + 8)* = « + B and (« — B)* = a — 8, and then also 
that (a : 8)* = «-: B and, if 8B 4 0, then (a/B)* = a/B. 

Thus the proof of theorem 4 is complete and at the same time we have 
shown the general validity of the theorems A and B. Since F, is the smallest 
splitting field of g,(x), it is uniquely determined up to isomorphisms.® 
More precisely: the field F, (with q = p’) admits, as we shall show in §8.5, 
exactly v automorphisms, i.e., isomorphic mappings onto itself. It follows 
that if F, and F, are two finite fields with the same number of elements, 


8 Cf. §1. 
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then an isomorphic mapping of one onto the other is possible in exactly 
v different ways. 


8.4. The Subfields of the Field F, 
Let Fy, be a subfield of F,. Since F,, is a finite field with the same 


characteristic p, it follows from theorem 3 that there exists a natural 
number v, such that 


(12) 1 =p". 


On the other hand, by theorem 2 the field F, is a simple extension of F,, , 
so that, if k is the degree of F, with respect to F,, , then the same proof 
as in theorem 3 gives 


(13) q= oH". 
From (12) and (13) it follows, since qg = p’, that 
(14) yok-y. 


Conversely, let v, be an arbitrary factor of v, so that (14) holds. We 
now set 


(15) S=q-—lands, =q, —1, with gq, = p”. 


Then s, is a factor of s, from which it follows that the polynomial 
f:,(x) = x —1 is a factor of the polynomial f,(x) = x* — 1. Since 
&q(x) = x -f,(x), and correspondingly gy (x) = x -f;,(x), it follows that 
&q(x) is divisible by g, (x). The set of q elements of the field F,, which 
constitute all the zeros of g(x), thus contains all the q, zeros of g, (x), 
which (cf. §8.3, theorem B) form a field of q, elements. It is also clear 
that F, can contain only one subfield with the fixed number of elements q, . 
Thus we have proved the following theorem. 


Theorem 5. Let F, be a finite field with q = p’ elements. Then if v, 
is an arbitrary factor of v, the field F, contains exactly one subfield with 
G1 = p” elements. If v, runs through all the factors of v, we obtain all the 
subfields of F,. 


Example. Let us take gq = 3° = 729. Then, apart from itself, the 
field F, contains the following subfields: one with 3! = 3 elements, one 
with 3? = 9 elements, one with 33 = 27 elements, and no other subfield. 


8.5. The Automorphism Group of the Field F, 


By theorem | the field F, is separable. Moreover, by theorem B (§8.3), 
F, is the smallest splitting field of g,(x), from which it follows that F, 
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is a normal extension of J7‘?).!® Since the finite fields have a particularly 
simple structure, their Galois groups are also particularly easy to describe. 

We first note that the prime field JZ‘ obviously admits only one 
automorphism, namely the identity,2° and that 7‘) remains elementwise 
fixed under every automorphism of F, . Thus the Galois group G(F, : IT‘) 
includes all the possible automorphisms of F,. Since F, is of degree v 
with respect to IT‘), there are exactly v of these automorphisms, so that 
the order of the Galois group is v. We may now state the following 
theorem. 


Theorem 6. The Galois group G(F, : II") is a cyclic group of order v. 
As a generating automorphism we may take the mapping « — a. 


We first show that this mapping is one-to-one. For by (3) it follows 
from «? = 8? that (a — 8)? = «a? — B? = 0,sothat « = f. Furthermore, 
it follows from (1) and from (a8)? = «?8? that this mapping is an iso- 
morphism, and therefore an automorphism. Since the order of the group 
is obvious, it is only necessary to prove that this mapping actually is of 
order v in the Galois group. But its ith power is obviously the mapping 
a —> «?* and since the equation x?" = x has at most p? solutions, v is the 
smallest positive integral value of i for which the mapping «> a? 
becomes the identity. 

The correspondence between the subfields of F, and the subgroups 
of the Galois group is now obvious: the subgroup G(F, : F,) corresponding 
to the subfield F, (with q, = p”, v = k - »,) contains the A automorphisms 
a> at (i = 0, 1,...,4 — 1), which are exactly those automorphisms 
of G(F, : IT‘) that leave all the elements of F,, individually fixed. 

In the proof of theorem 6 we have incidentally shown that in every 
finite field of characteristic p the pth root of every element exists and is 
unique; and then the same remark readily follows for the p"th root 
(n a natural number). The uniqueness of the pth root is obvious for an 
arbitrary field K of characteristic p, even when X is not finite. But in the 


case of an infinite field we cannot always conclude from a¢€XK that 
Vae K. 


Exercises 


42. Prove 
(a) the polynomial x? + x + 2 falls into linear factors over F, but is 
irreducible over Fo, ; 
(b) the polynomial x? + 2x? + 1 falls into linear factors over F,, but 
is irreducible over Fy ; 


19 Cf. §3, 
20 Cf. the remarks at the end of §8.3. 
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(c) consequently, both polynomials fall into linear factors over 
Fy29 , SO that their zeros may be chosen in Figo ; 

(d) if the zeros of the polynomial in (a) are denoted by &,, & , and 
those of the polynomial in (6) by 7, , 2, 3, then Fy = F;(€,), 
Foy = F3(£1), Frog = Fo(E1, 1). 

43. With the notation of ex. 42 investigate the structure of the five Galois 
groups G, = G(Fy: Fs), Gz = G(Fo,: Fs), Gs = G(Frog : Fy), Gy = 
G(Fog: Fo), Gs = G(Foo9 : Foz). In particular, determine how the 
automorphisms in G,, G,;, G,, G; act on &,, € and how the 
automorphisms in G,, G;, G,, G; act on 7, , 72, 73. 


8.6. Decomposition of the Cyclotomic Polynomial ®,(x) over Finite Fields 


The particularly simple structure of the finite fields depends partly 
on the fact that F, is a normal extension of its prime field J7‘”) and of 
every intermediate field (cf. beginning of §8.5). If Fy, is a subfield of Fy, 
the degree k of F, with respect to F,, is equal to the order of the Galois 
group G(F;, : F,,) (see the next-to-last paragraph of §8.5). 

We now turn to the problem of factoring the cyclotomic polynomial 
®,(x).24 Obviously we must assume that A is coprime to p. We let 7 
denote a primitive Ath root of unity. Since the elements of F,, apart 
from its zero element, consist of all the (q¢ — 1)th roots of unity, the 
element 7 will belong to F, if and only if A is a factor of gq — 1. But 
(cf. IB6, §4.2) we also have 


pr™ = I mod h, 


Now let e be the exact exponent to which p belongs mod h, namely the 
smallest positive integer satisfying the congruence 


p=! mod h; 


then 7 is obviously contained in F,. but not in any proper subfield of F,. . 
Thus 7 is of degree e with respect to JI‘), Since this statement is true 
for every primitive Ath root of unity, we have the following theorem. 


Theorem 7. Let (h, p) = 1 and let p belong to the exponent e modulo h. 
Then ®,(x) splits into irreducible polynomials of degree e over II‘), 

In order to answer the question how ®,(x) splits over an arbitrary F, , 
we need only investigate how a polynomial that is irreducible over a 


given finite field splits in a finite extension field. Here we have the following 
theorem. 


1 Cf. §5.4. 
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Theorem 8. Let F, be an extension field of F,, , let q = p”, and assume 
(12) and (14). Let f(x) be a polynomial of degree n irreducible over F,, . Let 
the greatest common factor of n and k be d, and 
finally letn = d+ %. Then f(x) splits over F, into 
d irreducible polynomials of degree i. 


Proof. Let « be a zero of f(x) in anextension 
field of F,. Then we consider the four fields 
Fy,» Fa, Fa,(«), and F,(«). In the diagram these 
fields are represented by circles. Where two fields 
are joined by a straight line in the diagram, the 
upper is an extension field of the lower, and 
the relative degree is written beside the line. In 
particular, n* denotes the degree of F,(«) with 
respect to F,. Then we must show that n* = 4. 

We set k = d- k. From (n, k) = d it then follows that (4, k) = 1. 

The number of elements in Fy, («) is p2" and the number in F;(q) is pr. 
Since F, (x) is a subfield of F 11a), it follows from §8.4 that v, -n |.v-n*. 
But n =d- ijv—vck =»v,-d-k,sothatv,:d-nl|v,-d-k-n* and 
therefore 7 | he n*, But then from (#, k) = 1 we have | n*, so that we 
can setn* = c: fh. 

Now F,(«) is obviously the smallest common extension field of Fy, («) 
and F,. But F,(«) has p*” = p’*" elements. Thus by §8.4 the field F,(«) 
contains a subfield K with p” elements. Then, since v | va, it follows 
from §8.4 that the field K contains a subfield with p’ elements, which 
must therefore be a subfield of F,(a). But by §8.4 the field F,(«) cannot 
contain any field other than F, with p” elements. Consequently F, is a 
subfield of K. In the same way we show that F, (a) is a subfield of K. 
To this end we must first show that v, - |v - %. But this fact is immediately 
obvious, since n = d-# and v = v,-d-k. Since F,(«) cannot contain 
any field other than F, («) with p”, it follows that Fy, («) is also a subfield 
of K. On the other hand, F,(«) is the smallest common extension field of 
F, and F,,(«); thus we must have c = 1 and therefore n* = a. Since 
this result holds for every zero « of f(x), the proof of theorem 8 is complete. 
The results of theorems 7 and 8 provide the solution of the problem dealt 
with in the present section. By Theorem 7 (with the same notation as in 
that theorem) the polynomial ®,(x) splits over the prime field J7™ = F, 
into irreducible polynomials of degree e. From Theorem 8 we see that 
we must set v, = 1, n = e. Thus k = v and d = (e, v). We thus have 
the following theorem. 


Theorem 9. Let (h, p) = 1, let p belong mod h to the exponent e, 
let g = p’, and finally let (e, v) = d. Then ®,(x) splits over F, into irre- 
ducible polynomials of degree e/d. 
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Remark. It is easy to see that the number e/d in theorem 9 is the 
exact exponent to which q belongs mod h. 


Examples. 1. We set h = 12 throughout, so that we must exclude the 
characteristics 2 and 3. By §5.5 of the present chapter #,,(x) = x* — x? +1. 
For every number a coprime to 12 we have a? = 1 mod 12; thus the prime 
numbers different from 2 and 3 can be divided into two classes, those belonging 
to the exponent 1 mod 12 and those belonging to the exponent 2. If p is in the 
first class (p = 13, p = 37, -*:), then %,,(x) splits into linear factors in [x]; 
if p belongs to the second class (p = 5,7, 11, -:), then ®,,.(x) splits in [x] 
into irreducible polynomials of the second degree, and if » is even these latter 
polynomials split over F,» into linear factors, whereas for odd v they remain 
irreducible. 


2. We now set g = 3° = 729 throughout. Then we have the factorization 
g —1 = 728 = 23-7-13. If 4{ 728, then 3° = 1modA, so that e|6 and 
(e, 6) = e. In the case h | 728 the polynomial %,(x) splits into linear factors 
over Fy) , a result which can also be obtained from the fact that F725 contains 
all 728th roots of unity and consequently all the Ath roots of unity with A | 728. 
We now choose A = 65. Since 3 belongs to the exponent 4 mod 5 and to the 
exponent 3 mod 13, it therefore belongs to the exponent 12 mod 65. Thus 
d = (12,6) = 6, so that by theorem 9 the polynomial ®,,(x) splits over F795 into 
irreducible polynomials of degree 44 = 2. 


Theorem 10. There exist irreducible polynomials of arbitrary degree n 
in F,[x]. Every such polynomial splits completely in Fan{x] and is thus a 
divisor of x* — x and, apart from the single irreducible polynomial x 
(for n = 1), every such polynomial is even a divisor of fs(x) = x° — 1, 
where for abbreviation we have set S = q" — 1. Conversely, the polynomials 
x" — x and fs(x) split in F,[x] into irreducible polynomials whose degrees 
are divisors of n. 


We give the proof of the first assertion. By theorem 4 (§8.3) there exists 
a field K with q" = p’” elements; by theorem 5 (§8.4) this field K contains 
a subfield L with q = p” elements. If K = L(a) (cf. §8.2, theorem 2), 
then the irreducible polynomial in L[x] corresponding to « is of degree n. 
Since F, is isomorphic to L by theorem 4 of §8.3, it follows that F,[x] also 
contains an irreducible polynomial of degree n. The other assertions of 
theorem 10 can then be derived without difficulty from the results of §8. 


Exercises 


44, What is the degree of the cyclotomic polynomial ®{.? Into what 
factors does it split over Fyy (1 < v < 12)? 


8.7. Closing Remarks 
Our knowledge of finite fields is due to the genius of Galois.?# Galois 


22 Evariste Galois, born in 1811, met his death in a duel in 1832. 
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obtained the finite field with p” elements by starting from a polynomial 
f(x), irreducible mod p, with integral coefficients and of degree vy mod p, 
and then adjoining an “imaginary”? zero of f(x) to the field of residue 
classes mod p.? The reader will realize from §8.5 how closely this procedure 
is associated with the revolutionary ideas that are now called the “Galois 
theory.” 

The original Galois procedure of symbolic adjunction of an “‘imaginary”’ 
naturally gave rise to the idea of altering his method by considering 
congruences with respect to the double modulus p and f(x) in the domain 
of polynomials with integral coefficients. It is easy to see that this method, 
which was developed by R. Dedekind in 1857,%4 also leads to the Galois 
field GF (p’). 

Dedekind also made use of the finite fields (or in other words, of the 
theory of higher congruences) for the investigation of algebraic number 
fields. In this connection we may point out that in the ring of integers 
of an algebraic number field (IB6, §8) the residue classes with respect 
to a prime ideal constitute a finite field. 

The importance of the finite fields for the study of groups of linear 
substitutions was made clear by L. E. Dickson in a publication of funda- 
mental importance.”® 

If an indeterminate t is adjoined to a finite field F, , the resulting field 
F,(t) shows far-reaching analogies with the field of rational numbers 
but its arithmetical structure is simpler in many respects, and by algebraic 
extension of F(t) we obtain fields that correspond to the algebraic number 
fields. The study of these fields has led to the development of a general 
theory”® of fields and ideals, and they have proved very useful for illus- 
trating the abstract theorems of such theories with simple but nontrivial 
examples. 


9. Irreducibility of the Cyclotomic Polynomial and Structure 
of the Galois Group of the Cyclotomic Field 
over the Field of Rational Numbers 


9.1. Lemmas on the Connection between 
the Polynomial Rings G(x} and IT {x} 


Again we let [7 be the field of rational numbers and G the integral 
domain of integers. Then JZ [x] contains the polynomials with rational 


28 (Euvres mathématiques, published by E. Picard, Paris, 1897. 

24 Abriss einer Theorie der h6heren Kongruenzen in Bezug auf einen reellen Primzahl- 
Modulus. Journal f. d. reine u. ang. Math., Vol. 54, pp. 1-26. 

25 Linear groups with an exposition of the Galois field theory, Leipzig, 1901. 

26 Cf. the important work of E. Steinitz [1]. 
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coefficients and G[x] contains the polynomials with integral coefficients, 
so that G[x] is contained in [J[x}. (The ideas and theorems developed 
in §9.1 apply only to nonzero polynomials, as will be tacitly assumed 
throughout.) A polynomial in G[x] is said to be primitive if there is no 
factor (except 1 and —1) common to all its coefficients; moreover, if the 
coefficient of the highest power of x is positive, the polynomial is said 
to be normed.?” Then we have the well-known theorem of Gauss:%8 the 
product of two primitive polynomials is primitive, from which it readily 
follows that the product of two normed polynomials is normed. If g(x) is a 
polynomial in IIx}, there exists exactly one representation 


(1) g(x) = 5 et(x), 


where g*+(x) is a normed polynomial and a and b are coprime integers, with 
b positive. When we speak of normed polynomials, we shall always mean 
polynomials from G[x]. 


Theorem 1. Let f+(x) be a normed polynomial which admits the 
factorization 


2) f+ (x) = g(x) A(x) k(x), 


where g(x), h(x), k(x), are finitely many polynomials of positive degree 
in IT [x]. Then f+(x) also admits the factorization 


F +(x) = gt(x)  At(x) kt), 


where g*(x), h+(x), k+(x),.... are the normed polynomials corresponding 
to g(x), h(x), k(x), ..., respectively. If the coefficient of the highest degree 
of x in f+ (x) is equal to 1, then the same is true for g+(x), h*+(x), k+(x).... 

We give the proof for the case of two factors g(x) and h(x). By (1) 
we have 


g(x) =Far(x), hx) = Shr), 


and thus 
Fr(x) = Fo ete) « hee). 


But if g(x) and A*+(x) are normed, then by the theorem of Gauss the 
polynomial gt (x) - h+(x) is also normed, from which it easily follows that 


f+ (x) = gt(x) ht (x). 


2? For brevity we say ‘‘normed”’ for “normed and primitive.” 
28 Cf, IBS, §2.11 and 12. 
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If we now set 
St (X) = Gyx™ +, at) = bxm te, A(X) = eT + +, 


we have a,, = 5, c,.If a, = 1, it follows that b, = c, = 1. 
The generalization to the case of more than two factors offers no 
difficulty. 


9.2. Proof of the Irreducibility of ®,(x) over IT 


In §5.4 of the present chapter we introduced @, (x) as that polynomial 
(with leading coefficient 1) which has exactly the primitive nth roots 
of unity as its zeros. Since in §9 the characteristic is always taken to be 0, 
the polynomial ®, (x) has integral coefficients (§5, theorem 5), and since 
the coefficient of the highest power of x is equal to 1, the polynomial 
®,(x) is normed. By theorem 1 of §9.1. it then follows: if ®, (x) is reducible 
at all, there must exist a factorization of the form 


(3) D,, (x) = 81(X) * 82x) 8400), 


where g,(x) (for « = 1, 2,...,k) is a normed® polynomial, irreducible 
over II), with leading coefficient 1. On the other hand, if ®,,(x) is irre- 
ducible, then in the representation (3) we must obviously take k = 1. 
Thus the purpose of the present section is to prove that k = 1. This 
purpose is almost completely attained by the following lemma. 


Lemma. /f ¢ is a zero of g,(x) and p is a prime number that does not 
divide n, then ©? is also a zero of g,(x). 


Proof. Since p does not divide n, the element ¢? is also a primitive 
nth root of unity; thus there exists an i such that ¢? is a zero of g,(x). 
(Our assertion is equivalent to the statement that i = 1.) We set 


(4) g(x) =x 4+ byxtIt +--+ 4x4 by. 
Since ? is a zero of g;(x), it follows that 

(S) (CP) + yy (GP)? + ++ + bio? + by = 0. 
But this equation means that the polynomial 

(6) Bi(x?) = xP + By xP + + + bx? + by 


has ¢ as a zero. Since g,(x) is the irreducible polynomial corresponding 
to {, it follows that 


(7) 81 (x) | g:(x?), 


2° From now on the superscript + for normed polynomials will be omitted. 
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and thus 
(8) gi(x?) = q(x) - gi (x), 


where q(x) is also a normed polynomial in G[x] with leading coefficient 1. 

Again letting H‘?) denote the homomorphism of G onto IT?) (cf. §5.4), 
so that the number g in G is mapped onto the corresponding residue 
class mod p, we now extend H) to the homomorphism H) of G[x] 
onto JI )[x] by agreeing that f(x) = a,x’ + **- + dy) is to be mapped 
onto f(x) = a,x" + + + a. Asa result of H') it follows from (8) that 


(9) Bi (x?) = G(x) * 2, (x). 
Moreover, from (3) and .§5, theorem 6, we have 
(10) PP(x) = G(x) * G(x) *° G,(*), 


where the polynomials £,(x), ...,2,(x) are not necessarily irreducible. 
Then by formula §8(6) we have 


(11) Ei(x?) = [8:(x))’. 
From (9) it thus follows that 
(12) [2:(x) = G(x) 2, (x). 


Now let £ be a zero of g, (x) (in an extension field of JT’), Then (12) shows 
that £ is also a zero of [g,(x)}? and consequently of Z,(x). Since p+ n, 
it follows from §5 that ®(x) can have no multiple zeros, from which 
we see that i = 1, so that the proof of the lemma is complete. 

Now it is easy to prove the following theorem. 


Theorem 2. ©, (x) is reducible over II, 

Proof. Let 7» be a primitive nth root of unity and let ¢ be a zero of 
g,(x). Then » = ¢’, with r coprime to a. Let 
(13) r= Py' Po" Pm 
be the factorization of r into prime numbers. Every p, is coprime to n. 
We now set 
(14) m=O); n= ns 1 My = EM. 


Then by an m-fold application of the lemma we see that y, is a zero of 
g,(x), and that 7, ....7 = 7m are also zeros of g, (x). Since every primitive 
nth root of unity is thus a zero of g,(x), it is clear that k = 1 in (3), or 
in other words that ®, (x) is irreducible. 
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9.3. Structure of the Galois Group of the Field II (€) 


Since ®, (x) is irreducible by §9.2, it follows that the field [7 (€) is of 
degree y(n) over I], where ¢ again denotes a primitive nth root of unity, 
and since /7(C) contains all the zeros of ®,,(x), it is necessarily a normal 
field. Let 7, 72,..., 7. (with u = (n)) be the entire set of primitive 
nth roots of unity, including ¢. Then the u automorphisms of the Galois 
group © of the field J7(Z) with respect to [7 can be represented in 
the following way: 


(15) Com; b> mses Sam. 
Here n; = 6%, where the number a; coprime to vis determined only mod n. 


Theorem 3. The Galois group © of the field II (€) with respect to 
IT js isomorphic to the multiplicative group of the relatively prime residue 
classes mod n (for this group see [B2, §1.2.10). 


Proof. For €—7; we can write €— ¢%. If +7, is also an auto- 
morphism of 6, we-can write it correspondingly in the form ¢ —> ¢%, 
If we apply the two automorphisms successively, we obviously obtain 
the automorphism ¢ — (%4), Thus the theorem is proved. 

It follows that G is an Abelian group. If p is an odd prime, then in the 
cases n = p* and n = 2p* it even follows that the group G is cyclic, and 
similarly for n = 4. On the other hand, for example, the group © is no 
longer cyclic for n = 8 anda = 15. 

As will be shown in the following section, the possibility of dividing a 
given circle into n equal parts by means of ruler and compass depends 
on the structure of the Galois group 6 of the field J7(Z). It turns out 
that this construction is possible if and only if the order y(n) of © is a 
power of 2. 


Exercises 


45. Let € bea primitive nth root of unity. Investigate in detail the structure 
of the Galois group of the field J7(€) with respect to JJ for the 
following values of n: 8, 9, 12, 18, 24, 36, 72, 35, 175. 


10. Solvability by Radicals. Equations of the Third and Fourth 
Degree 


10.1. By means of the Galois theory (§7) we can answer the question 
of the solvability of ‘‘an equation by radicals” or better of “a polynomial 
by radicals.’ By a radical of degree k over the field T we mean a zero of 
a binomial x* + c of degree k which is irreducible over T. Also, we shall 
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say that a polynomial g(x) irreducible over K is solvable by radicals 
(over K) if there exists a splitting field in the wider sense Zp = K(b,, ..., bm) 
such that b, is a radical over K,_, = K(b,,..., 5,3); here Ky = K and 
p = 1,...,m. We then have the following theorem (which will not be 
proved here; see e.g., Haupt [2], p. 515). 


Theorem 1. Let K be a field which( for simplicity) we take to be of 
characteristic zero. A polynomial g(x) € K{x] is solvable by radicals if and 
only if the (simple) factor groups of a composition series (cf. IB2, §12) 
of the Galois group G(N: K) are all of prime order; here N denotes the 
smallest splitting field of g(x) over K. 


Thus in the case of solvability by radicals the smallest splitting field V 
(which of course is contained in Zp) can be generated by successive 
adjunction of zeros of normal®° polynomials of prime degree with cyclic 
group; each of these zeros can then itself be represented by radicals. 
Since in general N is a proper subfield of Zz , the solution by radicals is 
in general not a “‘natural’’ process of solution; in other words, it is not 
a process which corresponds to the structure of N. 


10.2. By means of §10.1 we can decide, for example in the case of 
characteristic zero, for which degrees n a “general” polynomial g(x) 
is solvable in radicals; here a polynomial g(x) = x" + a,_yx"-1 + +++ + ay 
is said to be a general polynomial (over K) if the coefficients a, are also 
indeterminates over K.*1 The zeros of a general polynomial g(x) over K are 
indeterminates over K (see Haupt [2], p. 177), and thus are distinct; on 
the other hand, since g(x) is irreducible over K(ag , ..., Gn) (see Haupt [2], 
p. 256) the polynomial g(x) is separable. The group of the normal field of 
a general polynomial is the “‘largest possible’’; in other words (see Haupt [2], 
p. 556), it is isomorphic to the symmetric group S, of degree n (see IB2, 
§1.2.5 and §15.2), and is thus of order n!. But for n > 5 the only possible 
composition series consists of S, , A, , E, where E is the subgroup con- 
sisting of the unit element alone and 4A, is the so-called alternating group, 
with order 2-1”! (for 1 = 5, cf. 1B2, §15.4; in general, see e.g., Haupt [2], 
p. 560 ff.). For n > 5 the orders of the factor groups are therefore 2 and 
2-1n!, the latter of which is not a prime number. 

For n= 4 a composition series of the group S, = S, consists of 
S,, 44, Ng, Z,, E, where A, is again the alternating group, N, is the 
Abelian subgroup of 4th order, which is generated by the 3 products P; of 
two transpositions without common element, and Z, is the cyclic group 


30 A polynomial h(x) €¢ K[x] is said to be normal over K if there exists a zero « of 
h(x) such that A(x) splits completely into linear factors in K(«)[x}. 

51 Thus in a general polynomial g(x) the a), ..., dn_1, X are indeterminates over K. 
For this concept see IB4, §2.3. 
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of 2nd order generated, say, by P,. Thus for n = 4 the orders of the 
factor groups are 2, 3, 2, 2. Finally, for » = 3 a composition series 
consists of S,, A, and E with 2 and 3 as the orders of the factor groups. 
Thus we have the result: the general polynomial of nth degree is solvable 
in radicals if and only if n = 2, 3, or 4. 


10.3. From §10.! we also obtain a necessary and sufficient condition 
that a regular n-gon P,, , inscribed in a circle (of radius 1), is constructible 
with ruler and compass. \f we represent P, in the complex plane, the 
problem is seen to be equivalent to expressing the zeros of the poly- 
nomial x" — 1 over the field J7® of rational numbers by means of radicals 
of the 2nd degree; for the vertices of P, are the images of the nth roots 
of unity, which are representable as powers of a primitive nth root of 
unity, say ¢; and therefore N = JJ®(¢) must be normal over J/, 
since it is the smallest splitting field of the irreducible (over JZ) cyclotomic 
polynomial! ©, (x) of degree p(n). Since the Galois group G, = G(N : IT) 
(§9, theorem 3) is Abelian, the number ¢ can be represented by radicals, 
in view of the fact that the factor groups of a composition series of an 
Abelian group are all (cyclic) of prime order (cf. IB2, §12.1). The product 
of these prime orders is equal to the order of G,,, and is thus equal to 
y(n). For ®,(x) to be solvable in radicals of the 2nd degree it is therefore 
necessary that p(n) = 2*. But this condition is also sufficient for solvability 
by radicals of the 2nd degree. For if p(n) = 2*, the increasing sequence 
of relative normal fields K, corresponding to a composition series 
of G,, has the following property: K,,, is of 2nd degree over K,, so that 
K,., = K,(a), where a is a radical of 2nd degree over K,. We now let 
n = 2p} ++: pim, with r > 0, t, > 1, p, > 2, where the p, (and also the 
number 2 in the case r > 0) are the only prime numbers dividing n; 
then, since g(n) = 2%pit +: pim*(p, — 1): (Pm — 1) (See IB6, = §5), 
with q=0 for r <1, it follows from (mn) = 2" that t, = 1 and 
Pp, = 2+ 1, »=1,...,m. We thus have the result: the n-gon P,, is 
constructible with ruler and compass if and only if n = 2' with t > 1, 
or else n = 2*p, ++: Pm, where k > 0, m > 1 and p, is a prime with the 
property that p, = 2% + 1, » = 1,...,m. (Not every number 2° + 1 is a 
prime number.) Examples: the 4 smallest primes p, with this property 
are 3, 5, 17, 257. Thus for an n that contains, for example, one of the 
factors, 7, 11, 13, 19, the corresponding P,, cannot be constructed with 
ruler and compass. 


Exercises 


46. Take K = IT, and 
(a) g(x) = x8 — 3x? + 3x + 17 Cf. exs. 6, 12, 19, 22), 
(b) g(x) = x8 + 6x + 2 (cf. exs. 7, 12, 19, 24, 33), 
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(c) g(x) = x8 — 3x + 1 (cf. exs. 8, 13, 16, 20, 22), 
(d) g(x) = x® — 2 (ef. exs. 15, 22), 

(e) g(x) = x® + 10x? + 125 (cf. exs. 25, 32, 34), 
(Sf) g(x) = x4 — x? 4 1 (cf. ex. 30), 

(zg) g(x) = D,,(x) (cf. exs. 31, 45). 


CHAPTER 8 


Complex Numbers and Quaternions 


1. The Complex Numbers 


1.1. Geometric Representation 


Given a Cartesian coordinate system in the real Euclidean plane, the 
entire set of real numbers can be put in one-to-one correspondence 
with the dilatations of the plane, with the origin as center, in such 
a way that the real number a corresponds to the mapping 


(1) x’ = ax, y’ = ay. 


Here (x, y), (x’, y’) are the coordinates of a point and its image, and 
(though often excluded in other contexts) the zero dilatation with a = 0 
is here included. This mapping of the set of real numbers onto the set of 
dilatations is an isomorphism with respect to multiplication; for it is 
obvious that the product of two real numbers is mapped onto the product, 
i.e., the successive application, of the corresponding dilatations. 

For a = —1 the mapping (1) represents a rotation through 180 degrees. 
If we now wish to extend the domain of real numbers in such a way as 
to include an element / with 7? = —1, it is natural to define i as the rotation 
through 90 degrees, since its square (repeated application) is exactly a 
rotation through 180 degrees.! Thus it is appropriate for us to include 
the rotations (again with origin as center), or in other words the mappings 


(2) x’ = ax — by, y’ = bx + ay 


with a? + 5? = 1; here the angle of rotation ¢ is determined by a = cos g, 
b = sin gy. Then in order to have unrestricted multiplication we must 


1 Of course, the same remark also holds for the rotation through 270 degrees, which 
could just as well be taken as the definition of i. 
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also include arbitrary products of a dilatation with a rotation, i.e., the 
dilative rotations, which are obtained by simply dropping the restriction 
a + b* = 1 in (2). 

By (2) the dilative rotations are set in one-to-one correspondence with 
the vectors e,a + e,b, where e, , ¢, are the basis vectors of the coordinate 
system, i.e., the vectors with coordinates (1, 0), (0, 1). It is obvious that 
the relationship between a dilative rotation and the corresponding vector 
can also be described in the following way: the dilative rotation takes 
the vector e, into the vector corresponding to the rotation. Thus the 
complex numbers we are seeking may be defined either as the dilative 
rotations or as the vectors. A plane in which the complex numbers are 
so represented is called the Gauss plane. If we choose the first possibility, 
multiplication of the dilative rotations at once defines multiplication 
of complex numbers and shows that the nonzero complex numbers form 
a commutative group with respect to multiplication. Addition of complex 
numbers is simply defined as the addition of the corresponding vectors: 
if the complex numbers z, z’ correspond to the vectors 3, 3’, then that 
dilative rotation which takes e, into 3 + 3’ is denoted by z + 2’. Thus 
the complex numbers also form a commutative group with respect to 
addition. In order to show that the set of complex numbers with these de- 
finitions of addition and multiplication constitutes a field, it only remains 
to prove the distributive law of multiplication. For the proof we first 
note that if 


(3) 2 az 


then the complex number (the dilative rotation) « takes the vector 
corresponding to z into the vector corresponding to z’, as can be seen 
at once by applying az (first z, and then «) to e,. Thus (3) represents 
the dilative rotation « as a left-multiplication z— az in the domain of 
complex numbers. Now if 3,, 3, are the vectors corresponding to the 
complex numbers z,, z,, then the vectors corresponding to az,, «Z,, 
a(z, + Z,) are given by the images of 3,, 32, 3: + 3, under «. But the 
sum of the images of 3, , 3. under the affine mapping « is the image of 
the sum 3, + 3, and by the definition of addition the vector corresponding 
to «z, + az, is the sum of the vectors for az, , «z,, and it follows that 
a(z, + Z,) and az, + az, have the same corresponding vector: in other 
words, «(z, + Z,) = az, + az,, so that multiplication is distributive. 
The correspondence between the dilatation (1) and the real number a 
is obviously an isomorphism of the field of real numbers onto the set of 
dilatations. Thus we may equate the real number a with the dilatation (1), 
or in other words with that complex number to which the vector e,a 
is in correspondence (cf. the procedure in IB], §4.4 in (62)). The dilative 
rotation corresponding to the vector e,, i.e., the rotation through 
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90 degrees has already been denoted by i. Then the dilative rotation ib is 
the image of e, under the dilatation with the factor 5, so that the dilative 
rotation ib corresponds to e,b and the dilative rotation a + ib therefore 
corresponds to the vector e,a + e,b. Thus the complex number a + ib is 
precisely the dilative rotation represented by (2). In this complex number 
a = a- ib the real part a = Re « and the imaginary part b = Im « of 
a are, of course, uniquely determined. In polar coordinates a = r cos 9, 

=r sin o (r > 0) where the angle ¢ is the oriented angle between 
e, and e,a + e,b and is thus the angle of rotation of the dilative rotation 
a+ ib =r (cos p-+ isin ¢). The fact that under successive rotations the 
angles of rotation are simply added is now expressed by the equation 


(4) (cos p + isin ¢)(cos g’ + isin yo’) = cos(y + 9’) + isin(g + 9’); 


multiplying out on the left and comparing real and imaginary parts on 
both sides, we see that this equation is merely a combination of the 
theorems of addition for cosines and sines. Complete induction on n 
yields from (4) the de Moivre formula 


(5) (cos pm + isin m)” = cos nm + isinng 


for all natural numbers x. In view of the fact that (cos ng + isin ng)! = 
cos ny — isinng = cos(—z”) » + isin(—n) ¢ this formula also holds 
for negative integers n. 

The number r = Va? + 5? in polar coordinates is the length, or 
modulus, of the vector e,a + eb and is thus also called the modulus | « | 
(or the absolute value) of the complex number «. Since Va? = | a| 
(see IBI, (66)), this definition is in agreement for real numbers a with 
the definition of absolute value in IB, §3.4. The triangle inequality in 
vector algebra shows that the modulus of complex numbers also satisfies 
the inequality IB1, (53), and thus we also have the following consequence 
(IBI, (54)): 


lol —IBIl<lo+B|<la|+I/B}. 


Since | «| is obviously the dilatation factor for the dilative rotation a, we 
also obtain (in agreement with IB], (52)) the equation | «8 | = | «|| B|. 

The addition of complex numbers was defined above as the addition 
of the corresponding vectors. Thus the vector space of complex numbers 
is isomorphic to the two-dimensional (geometric) vector space (cf. IB3, 
§3.1). We will now derive the relations that hold between the multiplication 
of complex numbers and the two familiar vector multiplications. 

From the above isomorphism 


Zz=a-+ bi>3 = ca+ egb 
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we see that for the vectors x = ex, + 9X2, = e1)1 + eee we have 
the following relation between the inner product x-y = x,y, + Xe)e 
(see IB3, §3.2) and the outer (or exterior) product (see [B3, §3.3) 
[x, 0] = Xe — Xe), (with x as the complex conjugate of x) 


xy = x‘'y + i[z, 9]. 


Taking complex conjugates on both sides of this equation (§1.2), we have 
x) = x‘ y — i[x, yn]. From these two equations we obtain 


xy = $(%y + 29), 
1. _ 
[x, 9] = a (iy — xf). 


Thus the inner and the outer product of vectors in the plane have been 
reduced to the multiplication of complex numbers. 


1.2. Algebraic Methods of Introducing the Complex Numbers 


The geometric introduction of the complex numbers has the advantage 
that the operations of addition and multiplication for complex numbers 
are reduced to well-known geometric operations (addition of vectors, 
multiplication of mappings), but this procedure takes no account of the 
fact that the construction of the complex numbers is a purely algebraic 
question. To realize the truth of this statement, we have only to replace 
the dilative rotations by the corresponding matrices 


a —b 
(6) (5 a) . 
This set of matrices is closed with respect to addition, subtraction, and 
multiplication of matrices (see [B3, §2.2) and thus forms a ring under 
these operations. The commutativity of multiplication is easily shown. 
Since the determinant of (6) has the value a? + 5?, such a matrix (unless 
it is the zero matrix) has the inverse 


-1 fp-1 
aon =, with c= a? + Bb’. 
Thus the matrices (6) even form a field. Essential for the proof of this 


latter statement is the following property of the real numbers: 
(7) IfaA~A0orb +0, then a? + b? £0, 


which follows immediately from the order properties of the real numbers 
(see IB1, §3.4). 
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In order to regard every real number as a complex number, it is only 
necessary to identify a with (* °). If we then define i = (? ~$), the matrix (6) 
can also be written in the form a + ib: 


1+ 0= 0940 DE) =6 


The determinant c = a?+ 6b? =|a-+ib|? of (6) is called the norm 
N(a + ib) of a + ib. 

This purely algebraic definition of complex numbers shows at the 
same time that the only property required of the real numbers, apart 
from the fact that they form a field, is the property (7). Now for a field K 
the property (7) is equivalent to 


(8) cA —1 forallce K; 


for if a= c, b = 1, then (8) follows at once from (7), and conversely, 
given (8) and, let us say, a4 0, we may set c = ba“, from which 
a + b? ~ 0 from c? ~ —1. Thus the complex numbers can be introduced 
for any field K with property (8): the result is an extension field of K 
whose elements can all be expressed rationally (and even linearly) in 
terms of i and the elements of K, so that this extension field (by IB7, §1.1) 
can be denoted by K(i). 

The field K(i) is a vector space of dimension 2 over K with the basis {1, i}, 
since a + ib uniquely determines the pair (a, b). Thus in forming K(é) 
we can dispense with the matrices; we simply take a two-dimensional 
vector space over K; for example, the space of pairs (a, b) with a, be K 
(cf. IB3, §1.2). If {e, , eg} is a basis of this vector space with, let us say, 
e, = (1, 0), eg = (0, 1), we then seek to introduce a distributive multiplica- 
tion in such a way that e, is the unit element and e,2 = —e, . If we also 
require that (e,a)(e,b) = (e,e,)(ab) for v,~ = 1,2 and for all a,be K, 
the multiplication must be as follows: 


(9)  (ey@y + €oG@o)(€1b, + Cade) = €1(a,d, — agby) + (a,b, + ay). 


In particular, if we have constructed the vector space of the pairs (a, , a2) 
and have taken e, = (1,0), e, = (0, 1), then (9) can be written more 
simply as 

(a, , dy)(Dy , bg) = (Ab, — Agbe , aybg + agby). 


We must now show that under the multiplication defined by (9) the vector 
space is actually a field. The calculations necessary for this purpose become 
considerably more concise if we introduce matrices, so that it is not 
desirable to dispense with them entirely. Finally we must still show that 
x —> ex is an isomorphism of the field K onto the set of multiples of e, , 
so that we may set x = e,x = x without giving rise to difficulties as a 
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result of this identification of the operations in K and in the new field. 
There is a further danger consisting in the fact that (c,a, + e,a,) 6 is 
already defined in the vector space, namely as ¢,(a,b) + ¢2(a,b). But 
since exactly the same value is determined by (9) when 5b is replaced by 
e,b, everything is in order. 

Finally the introduction of complex numbers can be subsumed under 
the procedure described in IB5, §3.10: in the polynomial ring K[x ]we 
we compute mod x? + 1; that is, we form the residue classes? f(x) of 
f(x) € K[x]. Again, it is not altogether necessary to define f(x) as the set 
of polynomials = f(x) mod x? + 1; alternatively, in accordance with the 
procedure described in another connection in IB1, §2.2, we may consider 
f f(x) as a new symbol formed from a polynomial f(x) with the conventions: 
f@) = 8 = g(x) if and only if f(x) = g(x) mod x? + 1; f(x) + g(x) = 
f(x) + g(x), f(x) g(x) = f(x) g(x). Then the f(x) form a commutative 
ring, namely the ring of residue classes of K[x] with respect to x? + 1. 
Since x* + 1, by (8), has no zero in K, this polynomial is irreducible and 
the ring of residue classes is thus a field (see IBS, §3.10). If 
f(x) = amod x? +1 with ae K, we set a = f(x) =a. The division 
algorithm (IB6, §2.10) shows that to every f(x) € K[x] there correspond 
uniquely determined elements a, be K with f(x) = a+ bx mod x? + 1, 
so that f(x) = a+ bs. Thus we need only set # = i (then i? = —1, 
since x? = —1 mod x? + 1) in order to write the field of residue classes 
in the form K(i). 

We now investigate the automorphisms of K(i) with respect to K 


(cf. IB7, §1.2). If o is such an automorphism, then /? = —1 naturally 
implies o(i)? = —1. Thus, since x? + 1 = (x — i)(x + i), we must have 
either o(f) =i or o(i) = —i. Since o(a+ ib) = a+ a(i)b we have 


in the first case the identical automorphism and in the second case the 
mapping a + ib—>a — ib. The fact that this mapping is also an auto- 
morphism of K(i) with respect to K follows either from the general theory 
(1B7, §6) or from the fact that in calculations involving the sum and 
product of complex numbers only the special property i? = —| is utilized 
(beyond the general rules for addition and multiplication) and this property 
holds for —i, since (—i)* = —1. The element a — ib is called the complex 
conjugate a + ib of a + ib. Obviously « € K is equivalent to «a = a, and 
in general w + & = 2 Rea, a — & = 2i Ima. Since the passage to complex 
conjugates is an automorphism and N(a + ib) = (a + ib)a — ib), it 
follows that 


(10) N(aB) = N(x) N(B); 


* Of course, the overbar here has nothing to do with the notation for a complex 
conjugate. 
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for we have N(a8) = aBaB = a&BB. Incidentally, the equation (10) 
enables us to write the product of sums of two squares again as the sum 
of two squares; for example, (1? + 2)(3? + 4%) = 5? + 10. Finally, we 
remark that for « ~ 0 the equation aa = N(«) implies «1 = N(a)~!a. 


2. Algebraic Closedness of the Field of Complex Numbers 


2.1. The Intermediate Value Theorem 


A real function (i.e., a mapping of a set of real numbers, namely of 
the domain of definition of f, into the set of real numbers) is said to be 
continuous at the point x if for every real number e > 0 there exists a 
real number 6 > 0 such that for all x’ in the domain of definition of 
f with | x — x’ | < 8 we have | f(x) — f(x’) | < e. It is shown in analysis 
(see 112, §1) that the rational entire functions (for their definition see IB4, 
§1.1) are everywhere (i.e., at every point) continuous in the field of real 
numbers. 

The intermediate value theorem now states: if the real function f 
is defined and continuous at every point x witha <x <b(a < b) and if 
f(a) < C < f(b), then there exists a value c with a<c <b such that 


f(c) = C. 


Proof. Let M be the set of x with a < x <b and f(x) < C. Since 
(a) < C, the set M is not empty. Thus there exists a real number c, 
with a <c < b, which is the least upper bound of M. If f(c) < C, and 
thus in particular c < b, we set C — f(c) = e and then, in view of the 
continuity of f, we can determine 6 with c<c+6< 6} such that 
| f(x) — f(c) | < ¢ for all x with c <x <c+45. For these values of 
x we would then have f(x) = f(c) + f(x) —f(c) < C, in contradiction 
to the fact that c is an upper bound of M. On the other hand, if f(c) > C, 
and thus in particular a < c, we set f(c) — C = e and determine 6 with 
a<c—6 <csuchthat| f(x) — f(c)| < eforallxwithe —-d8<x<e. 
For these values of x we would then have f(x) = f(c) + f(x) — f(c) < CG, 
so that c — 6 (<c) would be an upper bound of M, in contradiction to 
the definition of c as the least upper bound of M. 

In view of the above-mentioned continuity of the rational entire 
functions, we have thus proved the intermediate value theorem for 
polynomials over the field of real numbers, a theorem which for greater 
simplicity we now write in the form of a theorem on the zeros of a 
polynomial:4 


8 We could also assume f(a) > C > f(b), which would amount to the above case if 
we replace f, C by —f, —C. 

“ By applying this theorem to the polynomial f(x) — C we obtain the general inter- 
mediate value theorem (for polynomials). 
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If the polynomial f(x) satisfies the inequality f(a) <0 < f(b) witha < b, 
then f(x) has a zeroc witha <c <b. 

This theorem holds not only for the field of real numbers but also for 
certain other ordered fields. For example, we obtain such a field if we 
restrict ourselves to the real algebraic numbers. These numbers actually 
form a field A; for it was shown in IB7, §2 that for two elements that 
are algebraic over a field K (here the field R of rational numbers) the 
difference and the quotient of the two elements are also algebraic over K. 
If > a,c’ = 0,a,¢A,a, 40, the real number c is algebraic over 
R(ap , .--, @,) and thus (by IB7, §2) also over R, and is therefore an algebraic 
number. Consequently, we have also proved the intermediate value 
theorem for polynomials over A. If the intermediate value theorem holds 
for polynomials over a field K, we shall say for brevity: the intermediate 
value theorem holds in K. 

From the intermediate value theorem for polynomials we have the 
Sturm theorem, which allows us in general to state the number of distinct 
zeros c with a < c < b: for if we are given the polynomial f(x) € K[x], 
let us form the Sturm chain (f(x), f'(x),fi(&), -.../-(x)), where the /;,(x) 
are determined by /,_,(x) = ocx) fix) — fear (x) (& = 0,...,7 — 0), 
fale) = 0), fol®) = f' 00), fra) = 900) f-(0), with polynomials g4(x) 
and degree f;,(x) < degree /,_,(x) (k = 1,...r); and if now for ue K we 
denote the number of changes of sign in the sequence 


Sw), fw, AW), +S) 


by w(u), where zeros are disregarded, it follows that if f(a), f(b) 40, a < b, 
then f(x) has exactly w(a) — w(d) distinct zeros c with a<c<b. 
We shall not prove this theorem here® but will merely illustrate it for 
the polynomial x? — 1. Here the Sturm chain is (x? — 1, 2x, 1), and we 
have w(u) = 2 for u < —1, wu) = 1 for —1 <u <1, ww) = 0 for 
u > |. Thus the assertion of the Sturm theorem actually follows for 
the polynomial x? — 1. 


2.2. Real-Closed Fields 


A field K is said to be real-closed if it can be ordered in such a way 
that the intermediate value theorem (for polynomials) is valid. We now 
wish to deduce for real-closed fields a certain characterization in which 
there is no mention of the order. By IBI, §3.4 we know that in every 
ordered field K, and thus in every real-closed field, we have 


(11) Yar~—-l, if a,...a,E K. 


k=1 


5 A proof is to be found, e.g., in van der Waerden [2], §69. 
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We also have, in every real-closed field: 
(12) IfaeK anda for all b € K, there exists ace K with —a = c’?. 


For the proof we assume that K is ordered in such a way that the inter- 
mediate value theorem holds in K. If a > 0, there exists an element b € K 
with a = b?, as was already deduced in IB1I, §4.7 from the intermediate 
value theorem. Thus, under the hypothesis in (12), we must have a < 0 
and therefore —a > 0, so that, as we have just remarked, there must 
exist an element ce K with —a = c’?. 

Finally, we have the following theorem for a real-closed field K: 


(13) Every polynomial of odd degree in K[{x] has a zero in K. 


For the proof we may restrict our attention to a polynomial 
f(x) = x" + Sz) a,x (a,¢ K). If we let 6 denote the maximum of 
l,l —a,...,1—a,_,, we have b* >0, a, > —(b—1) and thus 
f(b) > b" — (b — 1) X77) bk = 1. Applying the same procedure to the 
polynomial —f(—x), whose leading term must also be x” in view of the 
fact that n is odd, we obtain an a < —1 with f(a) < —1. By the inter- 
mediate value theorem there must exist a zero c of f(x) witha <c < b. 

But we have herewith obtained the desired characterizing properties: 
a field K is real-closed if and only if it has the properties (11), (12), (13). 

It remains only to prove that (11), (12), (13) imply that K can in fact 
be ordered in such a way that the intermediate value theorem holds in K. 
We first show that on the basis of (11), (12) the field K can be ordered 
in exactly one way. By IB1I, §3.4 we know that a domain of positivity in K 
will in any case contain all the squares a? with a 40, a € K. Thus the 
desired result will follow if for the set P of these squares we can deduce 
the characterizing properties IB1, (44) of a domain of positivity. The 
relations IB1, (44, ,) are obvious and IB1, (44,) follows at once from (12). 
But if a? + 5? were not a square (so that in particular a,b ~ 0), then 
by (12) there would exist an element c € K with —(a? + b?) = c?, which 
would imply —1 = (a/c)? + (6/a)? in contradiction to (11). Since (11) 
also implies the impossibility of a? + b? = 0 for a0, we have thus 
completed the proof of IBI, (44,). The proof that under the ordering 
defined by this domain of positivity (in which every positive element is a 
square) the intermediate value theorem follows from (13) will be postponed 
to §2.3, where the investigation of real-closed fields will be based not 
on the original definition of this concept but only on the properties (11), 
(12), (13). 

As was proved in IBI1, §3, the field K can be ordered if and only if 
(11) holds; in this case the field is said to be formally real. A field K is 
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real-closed if and only if K itself is formally real but no proper algebraic 
extension of K has this property. We shall not give a proof of this fact, 
which is often used as a definition of real-closed fields. Likewise without 
proof we mention that every real-closed algebraic extension of the field 
of rational numbers is isomorphic to the field A of real algebraic numbers 
introduced in §2.1; in this way the latter field is characterized (up to 
isomorphism) in a purely algebraic way, and thus in particular without 
any use of the field of real numbers. For the proofs of these statements 
we refer the reader to van der Waerden [2], §71. 


2.3. Algebraic Closure of a Real-Closed Field 


Let the field K.have the properties (11), (12), (13). Since (11) implies (8), 
we can form the extension field K(i), as was done in §1.2. We now prove 
the following basic theorem. 

A polynomial of positive degree in K[x] has a zero in K(i). Let us write 
the degree n of the polynomial f(x) in the form n = 2'm (m odd) and 
prove the assertion by complete induction on /. For / = 0 the polynomial 
F(x) has a zero in K itself, by (13). Thus we need only prove the assertion 
for / > 0 under the induction hypothesis, i.e., under the assumption 
that the assertion holds for polynomials whose degree is divisible by 2'-1 
but not by 2". By IB7, §1.5 there exists an extension L = K(i, a1, ..., Op) 
with f(x) = c [TL (x — «) (ce K). We now set N = n(n — 1)/2 and 
form the polynomials of degree N in L[x]: 


ficdo= Tl (x—(wa, +h, +0,)) for h=0,..,N. 


lev<ucn 


Since the coefficients of these polynomials are obviously the values of 
symmetric polynomials (in n indeterminates with rational integers as 
coefficients) for the arguments a, , ..., %, , it follows from the fundamental 
theorem on the elementary symmetric polynomials (see IB4, §2.4) and 
from the equations IB4, (25), (26) that these coefficients are already 
contained in K. Since N = 2'1m(n — 1) and m(n — 1) is odd, we can 
apply the induction hypothesis to f,(x): one of the zeros of f,(x) is already 
contained in K(i). Thus for every value of h(== 0,..., N) there exists a 
pair of numbers (v, , w,) with 1 < v, <p, < N and a pair of elements 
a, , b, € K such that 


Oy, %uy + h(a, + Ou) =a, + ib), . 
But by the Dirichlet pigeonhole principle (see 1B1, §1.5) the mapping 


h—> (vy, Mn) cannot be invertible, since the set of preimages contains 
N + 1 elements but the set of images contains at most N elements. Thus 
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there exist two distinct values h,k with vy, = vy, = v, wa = Me = BP. 
A simple calculation then shows® 


Ayo, = (h — k)-*(h(a, + ib,) — k(ay, + ib,)), 
a, + a, = (h — k)“*((@, + ib,) — (Qy + ib;)). 


Consequently, the coefficients of the polynomial x? — (a, + «,) x + ao, , 
which has the zeros «,, a, , are contained in K(i). Thus in order to show 
a, , x, € K(i) it only remains to prove that every element a + ib of K(i), 
and in particular the discriminant of the above polynomial, is a square 
in K(i). But from the ordering of K (which by §2.2 is unique) we have 
a* + b* > 0, and thus there exists a ce K with a? + b? = c®. Since 
c or —c >O0, we may also assume c > 0. Then c+ /a|>0, and 
from (c+ la|)(ec—|a|)=52 >0 it follows that c—|a|>0, 
so that c+ a, c—a=>O0. Thus there exist elements u, ve K with 
u? = (c + a)/2, v? = (c — a)/2, uvb > 0, and then in view of (2uv)? = b? 
we have 2uv = b and therefore (u + iv)? = (u? — v?) + 2uvi = a + ib. 

On the basis of the theorem that has just been proved, we can show 
that K(i) is algebraically closed, i.e., every polynomial f(x) of positive 
degree in K(i)[x] has a zero in K(i). For if by f(x) we denote the polynomial 
whose coefficients are the complex conjugates of the corresponding 
coefficients of f(x), a short calculation shows that the coefficients of . 
g(x) = f(x) f(x) are identical with their complex conjugates and are thus 
contained in K. Consequently there exists an ae K(i) with g(«) = 0, 
so that f(x) = 0 or f(«) = 0. In the second case the passage to complex 
conjugates shows at once that /(«) = 0, so that the proof of the theorem 
is complete. 

From the fact that the field L is algebraically closed it follows in general 
that every polynomial of positive degree L[x] splits into linear factors 
in this ring; that is, the polynomial is the product of linear polynomials. 
For if a,,...,, are the distinct zeros of f(x) in L with multiplicities 
m,,...,m,, then by IB4, §2.2 there exists a polynomial h(x) € L[x] with 
F(x) = A(x) The & — «,)*. By the definition of the multiplicity of a 
zero (see 1B4, §2.2) we have h(a,) ~ 0, so that h(x) has no zero in K and 
must therefore, since L is algebraically closed, be of degree 0; in other 
words, h(x) lies in L, which completes the desired factorization. 

If the coefficients of f(x) are already contained in K, then passage to 
complex conjugates shows that the equation f(«) =0 with ae K(i) 
implies f(x) = 0: in other words, if the zero a of f(x) is in K(é) but not 


® Since K can be ordered, it follows that if the integers A, k are distinct, then A — k 
as element of the field (which for h > k is a sum of A — k summands, each equal to the 
unit element of K) is actually + 0. 
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in K, then & is another zero of f(x). If in the linear factorization of f(x) 
we combine the factors x — «, x — a, then in view of 


(x — a«)(x — a) = x? — 2(Rea) x + N(a), 


we obtain a factorization of f(x) into two factors in K[x], one of which 
is quadratic. By complete induction on the degree of f(x) it follows that 
every polynomial of positive degree in K[x] can be factored into linear 
and quadratic factors. But now we can make good the omission in §2.2, 
by proving the intermediate value theorem: for if f(a), f(b) are of different 
sign, then at least one of the factors in the factorization of f(x) must be 
of different sign for the values a and b; but for a quadratic factor this is 
impossible in view of 


(x — a)(x — a) = (x — Rex)? - (Im a), 


and for the linear factor x — c it means that a— c <0 <b — cg; in 
other words, f(x) actually has a zero ce K witha <c < b. 

If for K we now take the field of real numbers and then the field of 
complex numbers, we have the following result: 

In the ring of polynomials in one indeterminate’ over the field of real 
numbers every polynomial of positive degree can be factored into linear and 
quadratic factors; and over the field of complex numbers every such poly- 
nomial can be factored into linear factors. 

This theorem is often called the fundamental theorem of algebra. 
The name was justified as long as algebra was confined to the study of 
the field of complex numbers and the fields and rings contained in it. 
But today the field of complex numbers has lost its central importance 
for algebra. It would be better to call this theorem the fundamental 
algebraic theorem for complex numbers.® 


3. CQuaternions 


In §! we have constructed the field of complex numbers as an extension 
of the field of real numbers. It is now natural to ask whether we can 
proceed in the same way beyond the complex numbers, Such an extension 
is possible, as we shall show below, if we abandon the commutativity 
of multiplication; in §3.4 we shall discover to what extent such a weakening 
of the axioms is in fact necessary. 


* Even for only two indeterminates the theorem is no longer true; the quadratic 
polynomia! 1 + xy in the indeterminates x, y cannot be factored into linear factors over 
any field. 

8 As the fundamental topological theorem for complex numbers we could consider the 
Cauchy convergence criterion, i.e., the fact that every fundamental sequence (see IB1, 
§4.4) of complex numbers has a !imit. 
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3.1. Quaternions as Hermitian Dilative Rotations 


In order to repeat for the complex numbers the step which in §1 led us 
to them from the real numbers we now introduce into the affine complex 
plane a Hermitian metric; in other words the length of a vector with the 
complex coordinates z, , Z, (in a given coordinate system) is now defined 
as the square root of the positive real number z,%, + z,%,. By Hermitian 
rotations about the origin we now mean affine length-preserving mappings 
with determinant? 1, leaving the origin fixed. Then how are these mappings 
to be expressed? An affine mapping with the origin as fixed point is 
given by 


, 
21 = O24 + YZe 


) zh = Ba + 82, 


and the fact that lengths are preserved means that 


(15) 242, + 22%, = (aa + BB) 2,2, + (ay + pS) 24%, + (ay + BS) 2129 
+ (yy + 85) 29%, . 


If in (15) we set z, = 1, z, = 0 and then z, = 0, z, = 1, we obtain the 
following equations 


(16) ai + BB=1, yyp+dd=1. 

For z; = Z, = | and z, = 1, z, = i the equations (15) and (16) imply 
(ay + BS) + (&y + B3)= 0, — (ay + BS) — (ay + BS) = 0, 

from which it follows that 

(17) ay + BS = 0. 


Conversely, the preservation of length, or in other words (15), follows 
from (16), (17). Taken together with the requirement on the determinant 


By — «5 = —1 and the first equation (16), the equation (17) means that 
y = —B, 5 = —4&, so that by (14) the Hermitian rotations have the 
form 

(18) 23 = OZ — Bz, ’ 


Z, = Bz + az,, 


with ax + BB = 1. If after a mapping of this sort we apply a dilatation 
with a real factor (which may = 0), we again obtain a mapping of the 
form (18), but now without the requirement a + BB = 1. Conversely, 


® It would actually be enough to require that the determinant be real and positive. 
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it is obvious that every mapping (18) can be produced by the successive 
application of a Hermitian rotation and a dilatation (with the real dilata- 
tion factor aa + BB). Thus the mappings (18) are called Hermitian dilative 
rotations. 

But for the sake of simplicity we shall operate below not with these 
mappings but with the matrices 


es 
9 4-64 
which (if the coordinate system is fixed) are in one-to-one correspondence 
with such mappings. As can easily be seen, these matrices form a subring 
of the ring of all two-rowed square matrices; in other words, the sum, 
difference, and product of two matrices (19) are again of the same form. 
Then the mapping 
ar (° °) 
0a 
is seen at once to be an isomorphism of the field of real numbers into 
the ring of matrices, so that without fear of misunderstanding we may set 


(20) a=(, °) 


for all real numbers. In this manner our subring of the ring of matrices 
becomes an extension ring Q of the field of real numbers. For A 40 
we have | A| = aa + BB 0, in view of the fact that «x + BP, as the 
sum of four squares, can be equal to zero only if all four summands, 
and thus also a, 8, are equal to zero. Since | A | is real, it is easy to show 
that A-! again lies in Q. Thus Q is actually a skew field (see the definition 
in IB3, §1.1). If we set?° 


ay = (Fg) F=f oh '= (0-3) 
(22) = a) + ids, B=a,+ia, 

it follows from (9) and (20) that 

(23) A = a + ja, + kag + lag. 


10 It is customary to write i,j, k instead of j, k, / but we have intentionally avoided 
this notation, since the square of each of the quaternions /, k, / is equal to —1; thus, 
if we wish, we may take any one of these quaternions to be the complex number i, if 
on the basis of an isomorphism of the field of complex numbers into the skew field of 
quaternions we set the complex numbers equal to certain quaternions. 
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Since a) + ja, + ka, + lag = 0 (ay, @,, G2, ag real) implies «a = B = 0 
and thus a) = a, = a, = a, = 0, the !, j,k, / are linearly independent 
over the field of real numbers, so that Q is a vector space of dimension 4 
over this field. It is for this reason that Q is called the skew field of quater- 
nions and its elements are called quaternions (Latin quaternio = set of 
four). 

By (20) the real number ! is to be replaced in the multiplication of 
quaternions by the unit matrix, and thus it is the unit element of Q. 


From (19), (22) and A* = (32) we have 


3 
AA* = A*¥A = ad + BB= Y¥ a, A* = ay — Ja, — ka, — lag. 
v=0 
The number }° , a,? is called the norm N(A) of A, and A* the quaternion 
conjugate of A. Since A* arises from A by transposition (see IB3, §2.6) 
and passage to complex conjugates, the invertible mapping A — A* of Q 
onto itself is an antiautomorphism; i.e., (A + B)* = A* + B* as in the 
case of an automorphism, but now (AB)* = B*A*. In exact analogy 
with the equation (10) we have* 


N(AB) = AB(AB)* = ABB*A* = AA*N(B) = N(A) N(B). 


Thus we can express the product of sums of four squares as the sum of 
four squares; for example, 


(12 + 22 + 32 + 42)(52 + 62 + 72 + 8%) = 122 + 24? + 30? + 602. 


For A+ 0 we obtain, in the same way as for complex numbers: 
A = N(A)1A*. 
From (21) it follows that 


(24) PHR=P=-1, je=l, kl = j, j =k, 


Thus multiplication is not commutative, so that the quaternions do not 
form a field. However, by (20) we have aA = Aa for every real number a 
and AeQ. Thus, exactly as in §1.2, we can construct the skew field Q 
by choosing a four-dimensional vector space with a basis {ey , e1, ¢2, es} 
and defining for it a distributive multiplication in such a way that 
(e,a)(e,b) = (e,e,,)(ab) for v, uw = 0, 1, 2, 3 and arbitrary a, 6, where eg is 
the unit element and equations (24) hold for e), e1, e2, es in place of 
LeRoi 


* The third equality follows from the fact that aA = Aa for every quaternion A and 
every real number a, and N(B) is real. 
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In the preceding discussion we could equally well replace the field 
of real numbers by any other field K in which >°., a,2 = 0 implies 
a, = 0 (vy = 0, 1, 2, 3); for in order to show that Q is a ring we do not 
need to impose any conditions on the field K, and Q is a skew field if 
and only if N(A)=|A|= >3.,a,2 =0 implies 4 =0, so that 
Ay = A, = A, = ag = 0. The above condition on K is equivalent to the 
condition that —1 is not a sum of two squares of elements from K. 
For on the one hand the equation | + a* + 6? = 0 (a, be K) obviously 
violates the given condition, and on the other hand if —1! is not the sum 
of the squares of two elements from X and if a, (v = 0,..., 3) in K are 
not all = 0, then in order to prove 5°, a,2 40 we may without loss 
of generality take a, ~ 0. In view of the fact that (a,/a,)? = —1 we then 
have a, + a, 0, so that (a)? + a,?)(a,? + a,”)-! = N(aB) by (22). 
Since a norm, as the sum of the squares of two elements in K, must always 
be ~ —1, we thus have >°_, a,2 4 0. 

Let us also remark that the concept of the quaternion skew field can be 
generalized in the following way: in a field XK let there exist elements c, d 
with c ~ 0 and —d + x? + yc for all x, ye K; the construction of Q, 
starting with a four-dimensional vector space over K, is now altered by 
replacing! (24) with 


poe k=-d P=-cd jk=l kl=jd Ij=ke, 
kj=—-l lk=—jd jl=—ke 


These generalized quaternion skew fields arise if we ask for those skew 
extensions of K in which every element wu satisfies the equation ua = au 
for all a € K and for every element u there exist certain b, b’e K with 
u? + bu + b’ = 0,1" It is easy to see that the generalized quaternion skew 
fields just defined possess this property: for u = ay + ja, + ka, + lag we 
must set b = —2a,, 5’ = a)? + a,c + a*d + ag*cd. If K satisfies the 
conditions (11), (12), then the general case can be reduced to the special 
case c = d = |; in fact, there then exists (up to isomorphism) exactly one 
(generalized) quaternion skew field over K. 


3.2. Quaternions and Space Rotations 

From the complex affine plane in which in §3.1 we considered the 
Hermitian dilative rotations we now turn to the complex projective line; 
i.e., we consider z, , z, as the homogeneous coordinates of a point, with 
Z,/Z, = z as the inhomogeneous coordinate in the case z, 0, while 
Z, = 0 (with z, ~ 0) gives the ideal point. In this way the complex 


11 If K has characteristic 2, then the definition is somewhat different; in this case the 
above definition provides only a field. 
12 For details see e.g. Pickert [3], Section 6.3. 


472 PART B ARITHMETIC AND ALGEBRA 


numbers correspond in one-to-one fashion with the proper points of the 
complex projective line. We extend the set of complex numbers by a new 
element oo, which we put in correspondence with the ideal point 
and use as its coordinate. For convenience we sometimes identify the 
points with their coordinates z,/z, or oo. The mapping (18) then becomes 
the projectivity on the projective line described as follows, with z, z’ as 
the coordinates of a point and its image: 


2 cdl if BPr+a-~0, baae aatle Of 


(25) 2200, if pPzr+a=d, 2A OO: 
z’ = a/f, if z=0, Be Os 
z= 00; Lf (ZS; B = 0. 


Since a common real factor for «, 8 is here of no importance, we may 
restrict ourselves to the case aa + BB = 1. 

We now map the complex projective line onto a sphere of radius 1/2 in 
the following way: to the proper point z = x + iy we first assign the 
point in space with the coordinates (x, y, —4) (in a given Cartesian 
coordinate system) and then project this point (stereographically) from 
the point (0, 0, 4) onto the sphere with the equation >°_, x,2 = } for 
the coordinates x, , X,, Xs of its points. To the point oo we assign the 
point (0,0, 4) on the sphere, which is not an image under the stereographic 
projection. A sphere used in this way for the representation of the complex 
numbers is called a Riemann sphere. The equations connecting z( ©) 
and the corresponding point of the sphere (with the coordinates x, , x2, X3) 
are easily calculated to be 


(26) X, + ix, = 2(22 + 177}, Xx, — ix, = &(z% + 1)7, 
2x3 = (22 — 1)(22 + -1)7}. 


We now assert that under this mapping the projectivities (25) (with 
aa + BB = 1) become the entire set of rotations of the sphere.!4 

We first consider the special case 8 = 0. If we set p = «/&, the projectivity 
(25) becomes z’ = pz with the fixed point oo. Since aa = 1, we may set 
a = cos g/2 + isin g/2 and thus obtain p = cos my + / sin gm. We then 
have z’z’ = zz, so that (26) leads to 


xi + ixg = (cos pm + isin p(x, + ix), Xg = Xz 


18 As follows from (14) if we set Z, = 0, and similarly for the second and fourth cases. 
14In fact, it was the study of rotations in space that led Hamilton (1844) to the 
definition of the multiplication of quaternions. 
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for the coordinates x) of the point on the sphere corresponding to 2’. 
But this equation represents the rotation of the sphere through the 
angle » around the axis oriented by the vector e,, where we consider 
the whole space as oriented by the triple (e, , e2 , ¢s) of basis vectors of 
the coordinate system. 

We next consider the special case in which «, 8 are real and 8B 40. 
Multiplication by the complex conjugate of the denominator in (25) 
leads, for z ~ —a/B, 00 with z’ = x’ + iy’ and the abbreviation 


d = (B2 + «)(Bz + «) = B?zz + 2aBx + a?, 
to the equations 
x'd = (a — B*) x + aB(z2% — 1), 
yd=y, — 2'2'd = (az — B)(a% — B) = 022% — 2aBx 4+ B 
From (26) we thus have 
X, = (a? — B*) x, — 2aBx,, x, = 28x, + (a? — B*%)x,, x, = %, 


and the same equations can easily be shown to hold for the two cases 
excluded above, namely, z = —o/8 and z= oo. Since we may set 
a = cos g/2, B = sin g/2, we here obtain the rotations about the axis 
oriented by the vector e,. 

We now consider the general case, where we may assume 8 + 0. 
Then the projectivity o defined by (25) does not leave the point oo fixed. 
Consequently, its fixed points ¢ are obtained from the quadratic equation 


(27) B+ (a—a)6+B=0. 


But this equation may be written (by passage to complex conjugates) 
in the form 


B(— £4" + (@— a)(— £4) +B =0 


and thus, if ¢ is a fixed point, so also is —f-1(~ ¢). Thus (25) has two 
distinct fixed points, and by (26) they give two diametrically opposite 
points of the sphere. The line joining these two points can now be brought 
into the plane x, = 0 by a rotation about the x, axis and then, by rotation 
about the x, axis, can be made to coincide with the x, axis. These two 
rotations taken together produce a rotation 6 which, by the special cases 
dealt with above, corresponds to a projectivity 7 taking the fixed points of 
o into 0 and oo. Thus the projectivity oJ = ror“ (first 7~1, then o, then 7) 
has the fixed points 0, 00; in other words, it belongs to the case B = 0. 
Thus it corresponds to a rotation 5, about the x, axis, and therefore the 
projectivity ¢ = 7~!a 7 corresponds to the rotation 64 = 5-16,6. 
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Conversely, every rotation is obtained in this way. For let d = ¥3_, e,d, 
with >°.,d,2=1 be the vector orienting the axis of rotation, 
where we may exclude the cases D = +e,. Then the fixed points of 
the rotation on the sphere have the coordinates (d,/2, d,/2, d,/2) and 
(—d,/2, —d,/2, —d,/2), so that by the equation 


Zz = (2x, + 2x,f)(1 — 2x 3)7}, 
which follows from (26), the corresponding complex numbers are 
(28) 0, = (q+ idl —ds)*, og = —(G + id,)(1 + dg). 


If in (25) we set «= (1 + %€,), B = £,(1 + &£,)-1, we obviously 
obtain a projectivity + taking £, into 0 and (since £, = —{;") taking 
¢, into oo. Thus the corresponding rotation 5 takes the vector D into e,, 
so that 5, = 68’5-1 is a rotation about the axis defined by e, with the 
same angle of rotation as 5’ and therefore corresponds to a projectivity oo . 
Consequently, the rotation 5’ = 5-18,5 corresponds, as desired, to the 
projectivity 0 = t~1o9r. 

On the basis of these results we can now express the «, 8 for o in terms 
of the d, and the angle of rotation of 5’. Since the projectivity o, has 
the fixed points 0, 00, it takes z into pz with a fixed factor p of absolute 
value 1. If we now apply ro = op7 to the point oo, we obtain, assuming 


B ~ 0 (and thus ¢, 4 0) 

(a — Bls)(abs + B)* = pbs", 
so that 
(29) p = (« — Bo)(a — Ba). 
From (27), (28) we obtain 

2d,(d, — id.) = & + & = (a — a) B, 

or, in the notation of (22), 

d3(d, + id,)“! = ag(a, + ia,)7. 


Thus there exists a real number c with a, = cd, (v = 1, 2, 3) and therefore 
(since ad + BB = 1, ¥°%., d? = 1) with a, + c? = 1, so that we may set 


(30) ay) = COS g/2, a, = d,sin y/2 (v = 1, 2, 3). 
From (29), (30) a short calculation shows that 


p = cos g + isin ¢. 
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In view of the significance of p for og and the earlier discussion of the 
special case 8 = 0, it follows that ¢ is the angle of rotation of 5, and thus 
also of 5’. Consequently, the equation (30), which obviously holds for 
the special case 8 = 0, represents the desired connection between quater- 
nions and rotations in space. We note that to every rotation the equation 
(30) associates exactly two quaternions with norm |; for if we replace p 
by g + 2m or dD by —Dd and — by — p the numbers a, (v = 0, 1, 2, 3) 
become —a,. 


3.3. Quaternions and Vector Algebra 


For a fixed Cartesian coordinate system with the basis vectors e, , ¢. , es 
the mapping 


3 
Yi ex, > xy + kxg + Ixg 


v=1 


is obviously an isomorphism of the three-dimensional vector space into 
the four-dimensional! vector space of quaternions. So we may make the 
identification 


3 
Y eX, = JX, + kx, + xg. 


v=1 


With a = >°., ¢,a, the quaternion 
A = a) + ja, + ka, + lag 


can then be written in the simple form a, + a. Thus dy is called the 
scalar part and a the vector part of A. A simple calculation on the basis 
of (24) then shows that 


(31) ab = —a-b+axb; 


here ab denotes the product of the vectors regarded as quaternions, and 
a:b and a x b are the inner (scalar) and vector products, respectively. 
If we pass to the conjugate quaternions on both sides of (31) (whereby 
the scalar part is not changed and the vector part is multiplied by —1) 
and note that such a passage is an antiautomorphism, we see that 


ba = —a:b —a x bD. Together with (31) this result shows that: 
(32) a:b = —4(ab + ba), 
(33) a x b = $(ab — ba). 


Thus (in analogy to the equations at the end of §1.1 for the two-dimensional 
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case) we may express the scalar and the vector products in terms of 
quaternion multiplication.® 

By means of (32), (33) the rules for calculation with these two vector 
products can easily be proved. For example, for the rule 


(34) (a x b) X ¢= b(a-¢) —a(b ce) 
we have by (33) 
4(a x b) x ¢ = abe — bac — cab + cba = (abe — cab) — (bac — cba). 
By (32) we also have 
abe — cab = (abe + ach) — (ach + cab) = 2b(a- ¢) — 2a(b - 0), 
and by interchange of a with 5 we obtain 
bac — cba = 2a(b - ¢) — 2b(a - ¢). 


These three equations taken together lead, after division by 4, directly 
to (34). 

If we regard the unit vector dD as a quaternion, the corresponding 
rotation is, by §3.2, the rotation 6, around the axis determined by bd 
through the angle 7. Now let 6 be an arbitrary rotation (with the origin 
as fixed point) and let A be the corresponding quaternion. Then the 
quaternion AdA-! corresponds to the rotation 6, = 66,61. But the 
latter is a rotation through the angle 7 and its axis is determined by the 
image d’ of D under 34, since 5/5 = 66, implies that the metric relations 
valid for dD, x, 5, (x) also hold for dD’, 5(x), 6,(8(x)). Thus 4dA~ is again 
a quaternion with scalar part 0 and vector part dD’ or —bd’. For an arbitrary 
vector x (which may always be written as a scalar multiple of a unit 
vector D) and its image vector x’ = 5(x) we thus have 


(35) x’ = + AxA-, 
But the sign here cannot depend on x, since from 
x, = Ax,A, x, = —Ax,A7} 
and x, , x, ~ 0 it follows that 
x, + x, = A(x, — 2,) 4 = £2, — 2,)' = +, — 2), 

15 Originally the scalar part and the vector part of the quaternion product were 
actually used in place of these vector products. Gibbs (1884) was the first to introduce 
vector products in modern notation, independently of quaternion multiplication. The 
concept of inner product had already occurred in the works of Grassmann (1844), 


whose exterior product (see IB3, §3.3) is also in close relationship with the above vector 
product. 
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which is impossible for any choice of sign. With a minus sign the equation 
(35) does not represent a rotation. For if we write A = a + a, the condi- 
tion x’ = x becomes x(a + a) = —(a-+ a)x, or in other words ax = 0, 
a-x = 0, and for a+ 0 these equations are correct only for x = 0, 
whereas for a = 0 they are correct for all x | a(4 0), while for a rotation 
(~ 1) the fixed vectors form a one-dimensional subspace.!® So we have 


(36) x’ = AxA-}, 


Since N(A) = 1, we could of course replace A-! by the quaternion A* 
conjugate to A. But we let A-! stand here, because (36) then represents 
a rotation for an arbitrary quaternion A +~ 0: we need only write 
A = VN(A) Ay, so that x’ = ApxA5}, N(A,) = 1. 

If ) = >3., ¢,d, is the orienting vector of the axis of rotation and » 
is the angle of rotation in (36), then by (30) we may write 


= cos g/2 + Dsin q/2, A* = cos g/2 — Dsin @/2. 


If we now transform (36) by means of (31) and note that (0 x x)-dD = 0 
and also, as follows from (34), (D x x) x D = x — D(x -D), we have the 
Rodrigues formula for rotations: 


(37) | x’ = xcos m + D(x -dD)(1 — cos g) + (0 X x) Sin g. 


This equation provides another proof of the fact that the dD, p appearing 
in (30) actually have the significance assigned to them above: for we need 
only prove that dD’ = d,x-D = x'-D and also, with the abbreviations 


y = x—Dd(x-d), ny’ = x’ —D(x-d), that |y| =| y'], 
yy’ =|y/Pcose, (y x y')d =/y sing. 


3.4. The Theorem of Frobenius 


From §1.2 and §3.1 we now see that the two extensions of K, namely, the 
quaternion skew field and the field K(/) (with i? = —1) have the following 
properties in common: each of them is a skew field with K as subfield; 
also az = za for every element z and all ae K; and finally, they are both 
vector spaces of finite dimension over K. We are thus led to the concept 
of a division algebra of rank n over K, namely, a skew extension field L of K 
with ax = aa for all ae K, «EL, which is of dimension n regarded 
as a vector space over K (see IB3, §1.2). The latter property requires the 
existence of elements p,,...,p,¢L such that for every «ce L there is 
exactly one n-tuple (a, ..., a,) witha, € K(v = l,...,n)anda= >‘, B,a,. 


16 Thus (35) with the minus sign gives the reflections (a = 0) and the rotatory 
reflections (a # 0). 
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The field K itself is, of course, a division algebra of rank 1 over K, where 
for the basis element 8, we may take any element + 0 of K. When the 
rank of the division algebra is not stated, we speak of a division algebra 
of finite rank 1 

The outstanding algebraic importance of the field of complex numbers 
and of the quaternion skew field now depends on the fact that they are 
the only division algebras of finite rank > 1 over the field of real numbers, 
a result that follows from the theorem of Frobenius: 


A division algebra L of finite rank over a real-closed field K is either K 
itself or else (up to isomorphism) is the field K(i) (with i? = —1) or the 
quaternion skew field over K. 


For the proof we first deduce the following property of L:18 for every 
a € L there exist r,s € K with 


(38) a2 = ra +5. 


Since L is of dimension n over K, the elements 1, «, ..., «” are linearly 
dependent; in other words, there exist a,¢ K, not all zero, such that 
Deo 4,0” = 0. Thus the polynomial f(x) = %_, a,x’ is of positive degree 
and can therefore, by §2.3, be split into linear and quadratic factors. 
Since f(a) == 0 and L has no divisors of zero, one of these factors* will 
become 0 when x is replaced by «. If this factor is quadratic, we already 
have the desired equation (38). But otherwise « € K and we have (38) with 
r= a;s: = 0; 

Now by (11) we see that 1 4 —1, so that 1 + 1 40, and thus we may 
divide by 2 (= | + 1), so that (38) becomes 


(39) (aw — r/2)? = r2/4 +5, 


Since we may assume L + K, there exists an element in L that is not in K. 
For this element the right-hand side of (39) cannot be the square of an 
element ¢ € K, since otherwise we would have « = r/2 + tora = r/2 —t. 
Thus by (12) there exists a te K with r?/4+ 5 = —t?=0, so that 
from (39) we obtain j? = —1 for j = (a — r/2) t-. If K is not already 
exhausted by the adjunction to K of a zero of x? + 1, there must exist a 


17 Such an algebra is seen at once to be an algebra in the sense of IBS, §3.9. By the 
words “division algebra” alone we mean a skew extension field ZL of K with aa = aa 
for all ae K, «€ L. In some investigations the concept “division algebra” is defined 
more generally; i.e., so as to include the alternative fields of §3.5, in which case the 
division algebras defined here must be called “‘associative division algebras.” 

18 It is only here that any use is made of the hypothesis of finite rank. 

* What we need here is the fact that replacement of x by « gives a homomorphism 
of K[x], but by IB4, §2.1, this follows from the easily proved fact that, for every given a, 
the g(«) with g(x) € K[x] are the elements of a commutative subring of L. 
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B € L that is not of the form a + jb (a, b, € K). We now make the following 
assertion, which will also be useful later on: for £,7¢L there exist 
a, b,c € K with 


(40) &y + nf = a& + bn +. 


For the proof of this assertion we need only write the equations corre- 
sponding to (38) for €,7,&+ 7 and take account of the fact that 
fn + n€ = (€ + y)? — &? — 7. In particular, we now use (40) with 
E= AV= B: 


(41) JB + Bj = aj + bB +c. 
Then for arbitrary u, v €¢ K we have 

(uj + vB)? = —u? + uve + uvaj + uvbB + v?B?. 
From the equations corresponding to (38) 


B=—r'Bt+s’ (r', s' EK), 
(42) 
(uj + vB)? = r"(uj + v8) +s" (r’,s"E K), 


we further have 
r’uj + r"vB + 8” = uvaj + (uvb + vr’) B + (—u? + wwe + v°s’). 


Since § is not a linear combination of 1, 7 with coefficients from K, the 
elements ij, 7, 8 are linearly independent over K, so that comparison of 
coefficients in the last equation gives 


ru = uva, rv = uvb + v?r'’. 
For u, v ~ 0 we thus have 
ub = v(a—r’). 


If in this equation we first set u = v = | and then u= 1, v= —1 
it follows, since a, b, r' do not depend on uy, v, that 


> 


(43) b=0, a=r’. 


With x, y,zeK we now write k = x8 — yj — z, and from (41), (42), 
(43) we calculate 
k? = x*s’ — y? + 2? — xye + (xa — 2z)(xB — yj), 
jk + kj = (xe + 2y) + (xa — 22)j. 
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These equations suggest that we should require 
xe +2y = 0, xa — 2z =0, 


as may obviously be done for arbitrary x 40, so that k?e K and 
jk + kj = 0. But since x 40 and 1,j, 8 are linearly independent, the 
element k cannot lie in K, so that k? is not the square of an element of K, 
and thus by (12) the element —k? is such a square. Thus we can choose x 
in such a way that k? = —1. But then the elements j, k satisfy those 
equations (24) in which / does not occur. If we now set / = jk, we have 
[? = jkjk = —jkkj = jj=—1, kl=—kkj=j, lk = jkk = —j, 
ji = jkj = —kjj = k, jl = jjk = —k, so that (24) is completely satisfied. 

From /=a+jb+ke with a,b,ceK it would follow, by left- 
multiplication with j that —k = ja — b + Ic, so that 


—k = j(a+ cb) + (ac — b) 4+ ke? 


and thus, on account of the linear independence of 1, j,k, we would 
have c? = —1, which is impossible by (11). Thus 1, j,k, / are linearly 
independent. In order to show that L is the quaternion skew field over K, 
we need only express every element €e€ K as a linear combination of 
1, j,k, 1 with coefficients from K. For this purpose we note that, under 
the single condition 72 = —1, we have from (41), (43) the formula 
JB + Bi = aj +e with a,ce K; it is true that in the proof of (43) we 
made the further assumption that f is linearly independent of 1, j, but if 
this is not the case our assertion follows at once from (41) (though with 
other values of a, c). We thus have the equations 


jé+G =aqj +e, 
k€é+ €k=ak+4+¢', 
lé+ €l =a’l +c", 
and consequently 
IGE + Ej) k—k(ké + EkK)—1E + ED) = (—a + a’ + a")—je—ke'—Ie". 


Since the left-hand side of this equation is equal to 2¢, the desired represen- 
tation for & is thereby obtained. 


3.5. Cayley Numbers 


In the theorem of Frobenius the only one of the “‘usual rules for calcula- 
tion” that is given up is the commutativity of multiplication, and as 
a result we obtain, together with the field of complex numbers, the 
quaternion skew field. It is now natural to ask what new structures we 
could obtain by giving up further rules of calculation, or perhaps by 
only weakening them. A complete answer to this question can be given 
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in the following special case: the associative law for multiplication, which 
is required in skew fields, is replaced by the rules 


(44) a(ab) = (aa) b, (ab) b = a(bb). 


Algebraic structures of this kind are called alternative fields, a name 
which is explained by the fact that if the other rules (for example the 
distributive laws) are retained, the rules (44) have the following con- 
sequence: the associator (ab) c — a(bc) is changed to — (ab) c + a(bc) when 
any two of the elements a, b, c are interchanged. If in the definition of 
“division algebra” we replace ‘skew field’’ by “alternative field,” the 
theorem of Frobenius must now be extended, to the effect that (up to 
isomorphism) there exists exactly one further division algebra, which 
is of rank 8.!® It is called a Cayley algebra, and its elements are called 
Cayley numbers or octaves.2° We can obtain these numbers most con- 
veniently if we start from the quaternion skew field Q over K and in the 
set of pairs (A, B) with A, Be Q we define addition and multiplication in 
the following way, where the conjugate quaternion is again denoted by 
an asterisk:?1 


(A, , By) + (Az, By) = (A; + Az, By + Ba), 


45 
Se (A; , By)(Ap , By) = (AA, — BpBY, AB, + AXB,). 


Since A — (A, 0) is then an isomorphism of Q, we may set A = (A, 0), 
so that we actually have an extension of K, and even of Q. The fact that 
multiplication is not associative is seen from the following example, 
in which for abbreviation we have set (0, 1) = E: 


(Ej) k = (0, —), EGk) = ©, /). 


By (45) it is easy to see that the mapping (A, B) > (A*, — B) is an antiauto- 
morphism of the Cayley algebra. Then [* = (A*, —B) is called the 
conjugate Cayley number of [ = (A, B). The real number 


rr* = F*P = N(A) + N(B) 


is called the norm N(l) of F. In the same way as for complex numbers 
and quaternions, we have N(I,,) = M(l,) N(F,), so that we may write 


19 The fact that only the ranks 1, 2, 4, 8 are possible can be proved without (44). For 
the case that K is the field of real numbers, this proof was already given by J. Milnor 
(Ann. of Math. 68, 444~449, 1958). From this fact, by using a result of A. Tarski 
(A Decision Method for Elementary Algebra and Geometry, 2nd ed., Univ. of California 
Press, 1951) we can then prove, for every natural number ” + 1, 2, 4, 8, that the rank n 
is impossible for an arbitrary real-closed K. 

20 In analogy with the quaternions (of rank 4, whereas here the rank is 8). 

21 For details see, e.g., Pickert [3], Section 6.3. 


482 PART B ARITHMETIC AND ALGEBRA 


the product of sums of 8 squares again in form of 8 squares.?? Since 
multiplication is no longer associative, we cannot prove this assertion 
in the same way as for quaternions, namely, from the fact that passage 
to conjugates is an antiautomorphism. Nevertheless, it is easy to show 
from (45) that 


N(U,P,) = NCE) NCU) — (CBE + (CBP)*) + (BEC + (BEC)*) 


with C = A,A,B, , and since the scalar part of the product of two quater- 
nions is independent of the order of the factors, we thus have the desired 
assertion. 

Let us note that the passage from quaternions to Cayley numbers is 
quite analogous to the procedure described in §3.1 for passing from the 
complex numbers to the quaternions; in order to display this analogy 
we have only to replace the quaternion A in (19) by the pair (a, 8), 
whereupon the matrix multiplication becomes 


(a , Bi) (me ’ Bo) = (a0 — ByBo ’ Bide a Gi Bp). 


The concept of a Cayley algebra can be generalized in the following 
way. We take for Q a generalized quaternion skew field, choose an element 
c € K that is not the norm of an element of Q, and in the second equation 
(45) replace the term —B,B* by cB,B*. These (generalized) Cayley 
algebras are again alternative fields. Their great importance lies in the 
fact, not discovered until 1950/51, that every alternative field which is 
not already a skew field must be such a Cayley algebra.”* The alternative 
fields play an important role in the study of projective planes: just as 
the incidence axioms for the projective plane and the theorem of Desargues 
allow us to construct a plane coordinate geometry over a skew field, 
so the little Desargues theorem, i.e., the special case with incidence of 
center and axis, and the incidence axioms for plane projective geometry 
allow us to construct a coordinate geometry over an alternative field 
(R. Moufang, 1933). One can also say (in a sense that can be made quite 
precise): in exactly the same way as the Desargues theorem corresponds 
to the associative law for multiplication, the little Desargues theorem 
corresponds to the alternative law (44).74 


22 Numerical example: 
(12 + 22 4+ 324 424 524 62 4 72 + 82)(92 + 10? 4+ 112 + 122 4+ 13? + 147 + 15? 4+ 16?) 
= 36% + 382 + 542 4 62? + 722 + 108? + 112? + 474%. 


23 For details see, e.g., Pickert [3], Section 6.3. 
24 See, e.g., Pickert [3], p. 134 (theorem 27), and p. 187 (footnote 1). 


CHAPTER 9 


Lattices 


Introduction 


For the connectives ~ (and) and v (or) between statements and the 
connectives © (intersection) and U (union) between sets we have the 
following rules (see IA, §§2, 7, 9): 


Commutative lawsianb=bnaj;aub=buaQ; 
Associative laws:an (bn c)=(anb)ncj;au(buc)=(aub)uc; 
Absorption laws: an (au b) = a;au(anb) =a. 


Here the symbols n, wu are intended to suggest a, v and 9, U but to be 
distinct from them. 


If a, b, c are subgroups of a group and if an b is the greatest subgroup 
common to a and b, and au b is the smallest one of the subgroups 
containing both a and 5b, then the rules listed above still hold, but the 
further rules for a, v and O, U (see IA, loc. cit.) are in general no longer 
valid. The same remark holds for the normal subgroups of a group, 
the subrings of a ring, the ideals of a ring, the subfields of a field, the 
sublattices of a lattice, and in general! (cf. IB10) for the subconfigurations 
of an (algebraic) configuration with respect to a given structure. The 
concepts and theories of the present chapter are thus of importance for a 
general theory of structure, the basic features of which will be discussed 
in the next chapter. 

A set with two connectives satisfying the above laws was called a 
dual group by Dedekind (1897); today it is customary to use the name 
lattice introduced by G. Birkhoff in 1933; (the corresponding French 
name treillis is due to Chatelet (1945) and the German Verband to Fritz 
Klein-Barmen (1932); the reason for the choice of such a name is given 
on page 487. 
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Quite apart from its importance for the general theory of structure, 
there is a certain attractiveness in studying an algebraic structure in which 
the convenient rules of commutativity and associativity are satisfied and 
there is complete duality between the two connectives; moreover, this 
structure is of quite different character from other well-known structures; 
e.g., neither of the two connectives is uniquely invertible. The study of 
Boolean lattices is even of practical importance for the construction of 
electric circuits (circuit algebra). 

In §1 we make use of the lattice Pe of all subsets of a given set e to 
introduce the fundamental concepts of the theory of lattices, in §2 we give 
examples, the large number of which indicates the importance of the 
theory, and then in the following sections we describe some of the charac- 
teristic properties of a few particular kinds of lattices. 

In the present chapter we can give only a first introduction to the theory. 
For further study we recommend the outstanding textbook: 


G. Birkhoff, Lattice Theory, Amer. Math. Soc. Coll. Publ., 2nd ed., 
1948, supplemented by the later work of the same author: 

Proceedings in Pure Mathematics, Vol. II, Lattice Theory, Amer. Math. 
Soc., Providence, R.I., 1961. 


Other important textbooks are: 


M.L. Dubreil-Jacotin, L. Lesieur, R. Croisot, Legons sur la théorie des 
treillis, des structures algébriques ordonnées et des treillis géometriques, 
Paris 1953. 

H. Hermes, Einfithrung in die Verbandstheorie, Verl. Springer, Berlin, 
Gottingen and Heidelberg, 1955. 

G. Szasz, Einfiihrung in die Verbandstheorie, Budapest, 1962. 

In many respects our discussion is based directly on the book of Hermes, 
to which we shal! often refer below. 

The first comprehensive account was given in the encyclopedia 
article. 


H. Hermes and G. Kéthe, Theorie der Verbdnde, Enz. d. Math. Wiss., 
2nd ed., I, 13, 1939. 


This article also includes important historical information. 

Since the theory of lattices is closely related to structure theory and 
consequently to mathematical logic, we use the logical symbols introduced 
in IA: 


— not A and Vv or 
A, for all s V, there exists an s such that 


— if... then «+ if and only if 
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1. Properties of the Power Set 


Let e be a set with elements denoted by lower-case Greek letters and 
subsets by lower-case italic letters. Let Pe be the set of subsets of e, 
the so-called power set. It is with this power set that the discussion in 
the present section deals almost exclusively; the reader should keep in 
mind that the elements of Pe are the subsets of e. 


Pe has the following properties (cf. IA, §9): 


I. In Pe there is defined a two-place relation, namely inclusion C. 
This relation is 


(ir) reflexive: ala, 
(It) transitive: alCbablc>ale, 
(li) identitive: aCbabCa>a=b, 


and is thus an order (in the sense of <). We use the word order here 
instead of “‘semiorder” or “‘partial order.” If it is also true, which generally 
will not be the case in the present section, that 


qd) A, A,(a Gb v b Ga), 


then we will speak of a linear order (cf. TA, §8.3). 

Problem: For which sets e is Pe linearly ordered by inclusion? 

Definition: A set in which an order is defined is called an ordered set, 
and a linearly ordered set is also called a chain. 

In general, we shall denote an order relation by < (read ‘“‘smaller 
than or equal to’’), reserving € for inclusion of sets. The relation ‘“‘smaller 
than or equal to” for real numbers will be denoted by <,. By a < b we 
meana <baa+~b,and bya > bwemeanb <a. 

To describe an ordered set, or (later) a lattice, it is necessary to state 
the ordering relation, so that such a set is completely described by symbol 
like (M, <). But where no misunderstanding can arise, we shall speak 
simply of the “‘ordered set /”’ or of the “‘lattice 4.” 

The order of a finite set M can be described by an order diagram, 
also called a Hasse diagram. For sucha diagram we first make the following 
definition. 

Definition: a is said to be a lower neighbor of b if a < 6 and there is 
no element of M between a and 5b; in other words, if = V.(a <cac <5); 
16.4 


axcx<boc=avcec=b. 


The elements of M are then denoted by points in a plane in such a way 
that if a is a lower neighbor of 6, the point a is lower on the page than 
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the point 5 and is joined to 5 bya straight line. For example, Figure 1 
indicates an ordered set with the following relations and no 


? @ others: 


F : non n<a n<h n<eg nk<d, 
axa ax<xb b<b cq cecxd dK€<d. 


nr 


Fie. Il. In Pe we have defined two connectives, which to a 
ig. 


pair (a, b) of subsets assign the intersection aC b and the 
union a U b, respectively. In terms of the elements of e these connectives 
can be defined as follows (cf. IA, §7): 


EeanboéEcan Ech 
fEcaUubo€Ecav Ech, 


but they can also be defined by the order alone without reference to the 
elements of e; namely, aM b is the greatest common lower element, or 
the greatest lower bound, and a U b is the least common upper element, or 
least upper bound. \n place of the pair of elements Pe, namely the two 
subsets a, b of e, we now consider an arbitrary subset N of an ordered 
set M. 

Definition: s is called an upper bound of N if Agey X < 5; and v is called 
the /east upper bound of N if 


(2v, 1) v is an upper bound of N, so that cA x <v, and 


(2v, 2) vis smaller than or equal to every upper bound of N: 
A a(. A. x<s—>vu ss). 


Not every subset of an ordered set has a least upper bound or even an 
upper bound; for example, in the set with the order diagram of Figure | 
the subset consisting of the elements a,,c has no upper bound, and in 
the set of positive rational numbers ordered by <, the subset of numbers 
with x? <, 2 has upper bounds, but no least upper bound. 

From (2v, 2) it follows that a set N cannot have more than one least 
upper bound. (If v and w are least upper bounds, thenv < wand w < v.) 
It is this fact that justifies our speaking of the least upper bound of N. 
It is also called the smallest common upper element or the join and is 
denoted by U,.y x or Uy x. In case N contains only two elements, we 
write v = au b. For this case let us write out the definition once again: 


(3v) ve=aubsoacvabs <vaA[@s <svab<s)>vgs], 


in other words: a u b is characterized by the following relations: 


(3v') axaubabs aubsdlas< sAb<s)>aubss}. 
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Correspondingly, the /east upper bound (greatest common lower element, 
intersection, or meet) is defined as follows: 


(2d) d= xo Nd<xar(Atcx>t<ad), 


or for two elements: 

(3d) d=anbod<cand<cbradl[t caat<cb)>t <d], 
or 

(3d’) anb<ananb<bad[t caat<b)>t<anbdl. 


Definition: An ordered set in which a meet and a join exist for every 
pair of elements is called a /attice; if for every subset of the ordered set 
there exists a meet and a join (in this case it is more usual to say “greatest 
lower bound” and “‘Jeast upper bound”), the lattice is said to be complete. 
If only one of the two elements “‘meet” and “join” is known to exist, 
we speak of a semilattice. 


The name Jattice is explained by the order diagrams (e.g., Figures 2 to 4, 
p. 492), in which each of the elements is joined by a straight line segment after 
the manner of a latticework for vines. 


The lattice Pe is complete, since in this case the join and meet are simply 
the union and intersection respectively of the subsets. The set-theoretic 
connectives will be denoted by round signs to distinguish them from the 
lattice-theoretic connectives with rectangular signs, and the logical 
connectives with acute-angled signs. 

From the definition of join and meet we have the following rules: 

The commutative laws: 


(4,1,d) anb=bna, (4,l,v) aub= bua, 

the associative laws: 

(4,2,d)an(bnc)=(and)nc, (4,2,v)au(buc)=(aud)uc, 
the absorption laws: 

(4,3,d) an(@ubd)=a, (4,3,v) au(anb)=a. 


Remarks: 1. On the basis of the associative laws we may omit the 
brackets when the connectives are applied to finitely many elements 
(cf. IBI, §1.3). 


2. The definitions and these rules for calculation are dual in the following 
sense: if in a valid theorem involving <,n. u we interchange < with > 


488 PART B- ARITHMETIC AND ALGEBRA 


and n with u, the resulting theorem is also valid. The principle of duality 
in projective geometry is a special case (cf. §2.2). 

Instead of using an order relation, we may also define a lattice by means 
of two connectives with the properties (4, 1-3). In order to make this 
statement completely clear, we introduce a new symbol: a set M in which 
two connectives T,£ are defined, satisfying the rules at b = bta and so 
forth, as in (4), is called (after Dedekind) a dual group. The above remarks 
(if the proofs for the rules (4) are carried out) show that every lattice is 
a dual group. We now assert conversely: every dual group is a lattice. 

For the proof we must show that in every dual group we can introduce 
an order with the property that for every two elements a, b there exists 
a least common upper element au b and a greatest common lower 
element arn 5b. In fact, we can arrange matters so thatau 6b = a1 b and 
anb=atTb, 

From (4) we first deduce two further relations: 


(Sd) aTa=a, (Sv) a1a=a@Q; 


(6) arb=beoatb=a. 


Here we shall give only the proof for (5d): 

In (4, 3, v) we set b = a: 
(i) ai(ata)=a. 
In (4, 3, d) we setb = ata: 
(ii) atT(ai(aTa)) =a. 
Then (i) and (ii) imply (5d). 

We now define 
(7) axbeatb=a. 
In order to preserve the duality we also define 
(7’) a>beairb=a. 


From (6) we then have: a < bob Da. 
We must now prove: 


a) < is an order relation; i.e., we have (1r), (It), (1i). As an example, 
we give the proof of (1t). In view of (7) the assertion reads: 


(atb=anbtc=b)>atc=a. 
1 We here allow ourselves a temporary misuse (in the present subsection only) of the 


symbols T, 1, introduced by Bourbaki in a different sense (in which they are used below 
in TB10). 
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so 


Proof: atc = (atb)tc=at(btc)=atb=a. 
b) With T instead of n we have (3d’) and thus atb =an b. 
c) With 1 in place of u we have (3v’) and thusaib=aub. 


(For b) it is convenient to use (7), and for c) to use (7’).) 

It is interesting to note that a given algebraic configuration can be 
defined either on the basis of an order relation or on the basis of con- 
nectives (cf. IB10, §1). 


IIl. In Pe we have the distributive laws 
(8d) an(buc)=(and)u(ano), 
(8v) au(bnc)=(aubd)n (auc). 


These laws do not follow from the above laws. For example, they do 
not hold in the lattice (d) with the order diagram in Figure 2 (page 492). 
Here we have 


an(buc)=a, 
(anbju(anc)=nun=n,. 


However, either of the two laws (8) follows from the other; for example, 
(8v) from 8d): 


(au b)n(auc)= [@ub)naju [au b)nc] by (8d) 


=au[aub)nc] by (4, 3, d) 
=au(anc)u(bnc) by (8d) and (4, 2, v) 
=au(bnc). 


Furthermore, in every lattice we have the distributive inequality 
(8d’) (an b)u(anc) <an(buc) 


since 
anb<a and anb<b<b<bue 
and 
anc<a and anexcxbue. 


In order to prove (8d) we thus need only prove that 
(8d”) an(buc) <(an b)u(anec). 


Definition: a lattice in which the distributive laws hold is called a 
distributive lattice. 
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We cannot enter here upon the subject of infinite distributive laws 
in complete lattices, although they are of fundamental importance for 
the representability of a lattice as the lattice of subsets of a set (cf. Hermes 
[1], and in particular §24.) 


IV. The set Pe has a least and a greatest element, namely the empty 
set and e. 


Definition: an element of an ordered set is called a 


least element or zero element (n), if Ay n<x, 
greatest element or unit element (e), if aM x <e. 


V. Definition: If a is an element of the lattice M with zero element n 
and unit element e, then a’ is the complement of a if 


(9) anad=n and aua =e. 


A lattice in which every element has a complement is said to be com- 
plemented. 

A distributive complemented lattice is called a Boolean lattice. For 
example, Pe is a Boolean lattice. Boolean lattices are discussed in IA, §9, 
so that we shall not deal with them here. 


VI. The set Pe has a class of distinguished elements, namely the 
subsets of e that consist of exactly one element &. In the lattice, they are 
the upper neighbors of the zero element. Every element of Pe is a union 
of such elements. 

In general, in an ordered set with zero element n the upper neighbors 
of the zero element are called atoms (occasionally also points). A lattice 
is called atomic if every element other than n is an upper bound for at 
least one atom. 

Thus Pe is a complete atomic Boolean lattice, and these properties 
characterize Pe in the following sense: 


Theorem. Every complete atomic Boolean lattice is isomorphic to the 
lattice of subsets of a set (namely the set of its atoms). 
For the proof we must refer again to Hermes [1]. 


2. Examples 


2.1. Lattices of Subgroups 


In the set e let there be defined a connective (to be denoted by simple 
juxtaposition) with respect to which e forms a group. Instead of Pe we 
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now consider the set Ue of all subgroups of the group e. In Ue also there 
is an order defined by (set-theoretic) inclusion, and it is well known that 
in this case the symbols a C b or a < b can be read as “‘a is a subgroup 
of 5.” Also it is well known that the intersection a 6 is likewise a 
subgroup of e and is thus in Ue, and in fact a M b is the greatest subgroup 
of e contained in both a and 5; thus 


anb=anb. 


But the set-theoretical union a U d is not always a subgroup of e, and 
thus is not always contained in Ue. Nevertheless, for two subgroups, 
or even for arbitrarily many subgroups, there always exists a smallest 
subgroup of e that contains them all, au 6 or uy a. Thus Ue forms a 
complete lattice.? 

It is also true that the set of subrings or the set of ideals of a ring e 
or the set of subfields of a field e form a complete lattice. A general state- 
ment in this direction is given in IB10, §2.3. 

The existence of the join in Ue is a consequence of the following general 
theorem. 

Theorem on the least upper bound. Jfanordered set M has the following 
two properties: 


1. every non-empty subset of M has a greatest lower bound, and 


2. the subset N © M has an upper bound, 
then N has a least upper bound v, and in fact v is the greatest lower bound 
of all the upper bounds of N. 


Proof. Let.S be the set of upper bounds of N. By property 2, the set S 
is not empty, and thus by property | the greatest lower bound v of S exists. 
For v we have by definition 


(V1) A, v<s and (V2) A, wisrwedon. 


The assertion is that v is the least upper bound of N; or in other words 


(Bl) wy * <ov and wav <to>v<t. 


Proof for (Bl). For every xe N we have A,.s xX <5, so that x < v 
follows from (V2). 

Proof for (B2). If Agey x < t, then te S,so thatv < t by (V1). 

The lattice of subgroups of a group is not necessarily distributive. 
For example, if e is the Klein four-group (see IB2, §15.3.3), then Ue has 
the order-diagram (d), Figure 2. 


2 In IB2, §3.4, we wrote <a U b> for a u b and <R)> for uy a, with R = Usen @. 
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If e is the commutative group with generators a, 8 and relations at = e, 
B? = e (where e« is the neutral element of e), and is thus the direct product 
of two cyclic groups of orders 2 and 4, then the subgroups are 


n= {e}, a= {e, a7}, b = {e, B}, p = {e, x, a”, a8}, q = {e, a”, B, a*B}, e. 


The order-diagram is given in Figure 3. The juxtaposed numbers will be 


O 
% 


Fig. 2. (d) Fig. 3 Fig. 4 


explained in §2.4. This lattice is distributive but not complemented, 
since a and qg have no complements. 

Thus we see that various types of lattices can occur as lattices of sub- 
groups. At the present time it is not known what properties a lattice V 
must have in order that there may exist a group e for which Ue is iso- 
morphic to V, or in other words, under what conditions is V representable 
as a lattice of subgroups (cf. M. Suzuki, ‘‘Structure of a Group and the 
Structure of its Lattice of Subgroups.” Ergebnisse d. Math. Neue Folge, 
Heft 10. Springer Verlag, Berlin, Géttingen and Heidelberg, 1956.) 

For a group e we have Ue C Pe. The order relation in Ue is same as in Pe. 
But, although Ue thus forms a lattice under the same ordering as Pe, 
we do not call Ue a sublattice but only as ubband (from the German 
name Teilbund, introduced by Schwan). 

For the concept of a sublattice we do not use the definition of a lattice 
in terms of order but in terms of the connectives (in other words, we make 
use of the definition of a dual group); if / is a lattice (dual group) with 
the connectives n, u and AN is a subset of M, then N is called a sublattice 
of M if N forms a lattice with respect to the same connectives. 

A further example is given in Figure 4. If we omit the two points that 
are twice circled, we obtain a sublattice, but if we omit only the central 
point, we obtain only a subband. 

The reader is invited to prove: 


1. N is a sublattice of M if and only if N is closed with respect to n 
and u; in other words, if and only if N contains an 6 and au b for 
every a, be N. 


2. Every sublattice is a subband. 
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2.2. Vector Spaces. Projective Geometry 


The vector subspaces of a vector space can be interpreted geometrically 
as the linear subspaces of a projective space. 

Let the set e consist of the points of the (for convenience, three- 
dimensional!) projective space, and let Le be the set of its linear subspaces, 
so that Le consists of the empty set n, the points, lines, planes, and e itself. 
Let the order be defined by inclusion. Then an 5 is the set of elements 
common to a and 5b, or in other words the intersection of these subspaces, 
and a wu b is the smallest linear space which includes a and 53, or in other 
words it is the subspace spanned by a and 8. 

This lattice is complemented: for every subspace a of e there exists a 
disjoint subspace a’ (a n a’ = n) which together with a spans the whole 
space e (a u a’ = e). Every element has infinitely many complements. 

The set Le is not distributive. For example, if a is a plane and 5, c are 
two points not on a, then an (b uc) is the point of intersection of 
the plane a with the line determined by 5 and c, but, on the other hand, 


(an b)u(anc)=nun=n. 


2.3. Every Linearly Ordered Set is a Lattice, with an b as the smaller, 
and a u bas the greater of the two elements a and b. We writea n 6 = min 
(a, b), au b = max(a, dD). 

Every such lattice is distributive, since 


an (6 uc) = min (a, max (8, c)), 
(an 6) u (anc) = max (min (a, 5), min (qa, c)). 


In verifying this statement the reader may assume b < c (since 6 and c 
occur symmetrically) and may therefore confine his attention to the 
three casesa <b<c,b<ax<cb<ece<a. 

A special case is the set Z of natural numbers (with or without 0) 
with < as the order: (Z, S$). 

In preparation for the next example we introduce the general concept 
of a direct product: let (M,, <), (Mz, <) be two lattices. (Whether the 
ordering is “‘the same” in both, and in fact just what such a question would 
mean, is of no importance to us here.) We form the set WM = M, x M,; 
its elements are the pairs (a, , a2), a, € M,, a, € M,. In M we define an 
order by 


(a, , a2) < (b,, bg) ay Sb Aa <b. 
Then M is a lattice with 
(a, , dg) M (6, , be) = (a, Nn by, ayn by), 
(a, , dg) U (by, bg) = (a U By, ay Us by). 
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It is easy to see that the direct product of two distributive lattices is a 
distributive lattice. 

We may allow the case that M, and M, are the same lattice. Further- 
more, we can form the direct product of arbitrarily many factors. 


2.4. Divisibility 


Let M = Z be the set of natural numbers, not including 0. The relation 
a|6 (a divides b) is an order, an b is the greatest common divisor, 
and au b is the smallest common multiple of a and b. Then (Z, |) is a 
lattice, which is distributive. The distributivity is closely connected with 
the unique factorization of a number into prime factors. Let p,, po, ... 
be the prime numbers in their natural sequence, and let 


a=[[p», b=[[ pe c=[] pm 


for «,, B,, y, the value 0 is also allowed; the product is to be extended 
only up to the highest prime dividing a, b, c. Then bn c = [], p™™ 4%), 
au b= YT], pm™™,-8.) and the assertion reads: for every v 


min (a, , max (f, , y,)) = max (min (a, , 8,), min (a, , y,)). 


The proof is the same as in §2.3. By assigning a to the system (a, , a, ...) 
we map the lattice (Z, |) isomorphically onto the direct product of infinitely 
many factors (Z, $). 

A finite sublattice of (Z, |) is formed, for example, by the factors of 
the number 12. Its order diagram is presented in Figure 3. It has a zero 
element n = 1 and a unit element e = 12. It is not complemented. 


2.5. Circuit Algebra 


An electric circuit consists of conductors to which, by neglecting 
resistance, we may assign the conductivity 1 or 0 according to whether 
or not a current can flow between their endpoints. The conductivity 
of a circuit in which conductors with conductivity a, b are joined in series 
or in parallel is denoted by an b and by au 5b, respectively. The values 
of an b and au 5b are determined from those of a and b according to 
the following tables: 
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which are the same as the truth tables for logical conjunction and alter- 
native (IA, §2.4). We are dealing here with a Boolean lattice. 

These examples show that various types of lattices play an important 
role in various parts of mathematics. In the following sections we give 
some important theorems for a few types of lattices. 


3. Lattices of Finite Length 


In algebra an important role is played by a certain finiteness condition, 
namely, the factor chain condition.® In the simplest case this condition 
refers to the natural numbers as follows: if, beginning with a natural 
number z, , we form a chain of factors, i.e., a sequence Zp, 2, , Z2, ... With 
Z,41|2Z,, the chain contains only finitely many distinct elements, a fact 
which is the basis, for example, for the factorization of every natural 
number into prime factors (though not of the uniqueness of this factoriza- 
tion). 

The factor chain condition can be formulated as a lattice property 
in the following way: 


Definition: a lattice (M, <) is said to be of finite descending length if 
every descending chain a = X) > x, > xX, >... contains at most finitely 
many distinct elements. The concept of finite ascending length is defined 
analogously. If a lattice is of finite length both ascending and descending, 
it is said to be of finite length.‘ 

The lattice (Z, |) is of finite descending length but not of finite asceuding 
length. The lattice of linear subspaces of a projective space is of finite 
length if the dimension of the space is finite. But one of the advantages of 
a lattice-theoretic treatment of projective geometry is that it includes 
spaces of infinite dimension. 

In xX) < x, < ++ < x,, the number / is called the /ength of the chain. 
Statements about the length of chains are generally of 
interest only for proper chains, i.e., for chains containing 
only distinct elements. 

If a lattice is of finite length, it does not necessarily 
follow that the lengths of the proper chains have an 
upper bound. For example, in the lattice with the 
order diagram of Figure 5 the Ath chain is of length k. 

Fig. 5 

8 Cf. IB6, §2.9. Maximality condition for ideals. 

4 For our present purposes we have given a stronger (or more restrictive) meaning to 
“of finite length” than the one given in Hermes ([{1], page 73). A lattice is there said to 
be of finite length if every chain joining any two elements is of finite length. Every 


lattice of finite length in our sense is of finite length in the sense of Hermes, but not 
conversely. 
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Theorem. Every non-empty lattice M that is of finite descending length 
contains a zero element. 


Proof. Let x, be an arbitrary element of M@. Then we have 
either AgXo < x; here xy is the zero element; 


or V, —(%) < x). Here it is not necessarily true that x < x), but 
then we have x, = xn xX) < x). We then proceed with x, in exactly 
the same way as with x,. In this way we obtain a descending proper 
chain, which by hypothesis has only finitely many elements, breaking off 
say with the element x, . But this means that for x, we have A,x, < x, 
so that x, is the desired zero element. 

Thus it is easy to see that every lattice of finite descending length is 
atomic (definition in §1, VI). But an atomic lattice is not necessarily of 
finite descending length; for example, if e is infinite, then Pe is atomic 
but is not of finite descending length. Thus atomicity represents a 
weakening of finiteness of length, just as finiteness of length is a 
weakening of finiteness. The fact that important consequences can be 
drawn from the property of atomicity is clear from §1, VI. 

For lattices of finite descending length there is an analogue to the 
factorization of a number into prime factors. We first replace the concept 
of product, which is foreign to the lattice (Z, |), by the concept of least 
common multiple: every number can be represented as the least common 
multiple of finitely many prime powers. The prime powers can be 
characterized (in a lattice-theoretic way) by the property that they cannot 
be represented as the least common multiples of other elements. Then 
the factorization theorem can be expressed as follows for lattices: 

Definition. An element q of a lattice M is said to be wu-irreducible, 
or primary (the latter term being due to a certain analogy with the con- 
cept of “primary ideal’ in rings (IB5, §3.6)), if g cannot be represented 
as the join of two other elements, or in other words if 


q=xUyrq=xvq=y. 

It may happen that a lattice M consists entirely of primary elements, 
as will be the case, for example, if M is linearly ordered. It can also 
happen that a lattice has no primary elements at all; for example, the 
lattice of the pairs of rational numbers (a, , a2) with O < a; < 1 under the 


ordering 
(a, , a2) < (0, da) ay S by A a, S by, 


in which for every given pair (a,,qa,) there exist pairs (5, , 6.) with 
by <a, by < ag, and (@, , a2) = (6, , ag) U (@, , by). 


Under what conditions is it true that every element in a lattice can be 
represented as the join of primary elements (or as we shall also say, 
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can be factored into primary elements)? Here we shall discuss only 
representation as the join of finitely many primary elements and not 
the question of representation as the least upper bound of arbitrarily 
many elements. 

If a itself is a primary element, we shall regard it as a factorization. 


Lemma. If a is an element that cannot be represented as the join 
of finitely many primary elements, then there exists an a, <a with the 
same property. 

For since a is not a primary element, there exists a factorization 
a=a,u b, with a, <a and 5, <a. But then at least one of the two 
elements a, , b, cannot be factored into finitely many primary elements, 
since otherwise there would exist such a factorization for a itself. 

Thus if there is an element a in M that cannot be factored into finitely 
many primary elements, there is a descending chain a >a, >... of 
infinite length. So we have the following theorem. 

In a lattice of finite descending length every element can be represented 
as the join of finitely many primary elements (u-irreducible elements). 

The fact that the condition “‘of finite descending length’’ is sufficient but 
not necessary is shown by the example of an infinite linearly ordered set. 

Such a factorization is not necessarily unique, as can be seen from the 
nondistributive lattice (d), Figure 2. In the next section we shall see that 
in a distributive lattice the factorization is unique. 


4. Distributive Lattices 


In the preceding section we described the effect of a finiteness condition; 
in the present section and the next one we discuss the effect of certain 
special rules of calculation. 

We have already mentioned the theorem: in a distributive lattice every 
element can be represented in at most one way as the join of finitely many 
primary elements without superfluous elements. 

The phrase “without superfluous elements’ is necessary, since, e.g., 
in the lattice of the divisors of 12 we have 


2=4u3=2u4u3. 


We define: in a representation a = q, u -:: uq;, the factor q; is said 
to be superfluous if the representation contains a factor g; with g; < q;, 
iA j. 

The proof is exactly analogous to the proof for the uniqueness of the 
factorization of natural numbers into prime factors. In the latter proof 
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an important role was played by the lemma: if p is a prime number, 
then p| ab implies p|a or p| b. We here prove the following lemma. 


Lemma 1. If M is a distributive lattice and p is primary, then 
Psaub>p<avp<b. 


Proof. The relation p<aub means that p=pn(au bd) = 
(pn a) u(pn b) (by the distributive law). Since p is primary, it follows 
that p = pnaor p= pn 6, or in other words p <aorp < b. 

Successive application of the lemma gives: if p is primary and 
P<x%,Ux,uU-: U x,, then there exists an x; with p < x;. 

Now if 

a=pyuesup =U Us 
are two decompositions of a into primary elements without superfluous 
elements, then for every p, there exists at least one g; with p,; < q,; and for 
this g; again a p, with gq, < p,. But from p; < q; < p; it follows that 
i = k, since otherwise p; would be superfluous. Thus p; = q;. For every 
p; there exists an equal g; and conversely. 

We now have the following converse. Jf every element in M can be 
factored into finitely many primary elements without superfluous elements, 
then M is distributive. 

For the proof we make use of an analogue to lemma 1. 


Lemma 2. Jf every element in M is uniquely decomposable into primary 
elements (up to superfluous elements) and if p is a primary element, then 


PKxaub>pKcavps<b. 


Proof. Ifa=q,u-ug,,b=q,u- uq, are the unique decom- 
positions of a and b, then 


aub=qu-'UgGUGU Ug, 


is the unique decomposition, after cancellation of superfluous elements, 
of au b. From p < au b it follows that 


Puaub=aub=puqu-ugugqu-ugqd, 


which fails to be a second decomposition of a u b only if p is superfluous, 
or in other words if p <q, orp <q;. 
In order to show the distributivity we must prove 


an(buc)<(andu(anec). 


5 Fundamental lemma of the theory of divisibility. IB6, §2.5. 
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Ifan (6uc) = q, u-:: u q,, then for each of these elements g we have 


qganqcbue, 

i.e. q<arnq<bvq<o), 
q<anq<b)vq<anq<o), 
qxanbvq<ane, 

and therefore 
q<(anbju(@ne). 


Thus distributivity (in lattices of finite descending length) is charac- 
teristic for the uniqueness of decomposition into primary elements. 
In complemented lattices it is also characteristic for the uniqueness of 
the complement. 


Theorem. In a distributive lattice every element has at most one 
complement. 


We prove somewhat more, namely: 
From auu=auvandanu=anv it follows that u = v. 


Proof. u=(auu)nu=(auv)nu=(anu)u(vnu) 


=(anv)u(unv)=(@uunv=(udv)nv = v. 


What we have proved here can be expressed somewhat differently if 
we introduce a new concept, as follows. 

It is easy to prove that if a < b, then the elements x witha <x <b 
form a sublattice, which is called the closed interval b/a (to be read: 
b over a). An element x’ with x u x’ = b,x n x’ = aiscalled the relative 
complement of x in b/a. A lattice is said to be relatively complemented 
if every interval is complemented. 

We have proved: in a distributive lattice every interval contains at most 
one relative complement for a given element. 

Conversely: if in a relatively complemented lattice M every relative 
complement is uniquely determined, then M is distributive. 

The last result follows from the fact that every nondistributive lattice 
contains a sublattice of type (d) or (m) (see §5). We shall not give the 
proof here. 

We have now made the first approach to a question of fundamental 
importance, namely the representation of a lattice as a set-lattice. In many 
applications the elements of a given lattice V are subsets of a set e: V C Pe. 
Often the order in V coincides with inclusion in Pe, so that V is a subband 
of Pe. It is natural to ask: when is V a sublattice of Pe? A sublattice of 
Pe is called a set-lattice because in this case the meet is the intersection-set 
and the join is the union-set. For an arbitrary lattice V, whose elements 
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are not assumed in advance to be subsets of a set e, the question at issue 
can be stated precisely as follows: what conditions must V satisfy in order 
that there may exist a set e such that Vis isomorphic to a sublattice of Pe? 

“Isomorphic”’ here means: there exists an invertible (i.e., one-to-one) 
mapping ¢ of V onto a subset pV of Pe which satisfies the homomorphism 
conditions: 


(H1) g(an db) = gan gb, (H2) g(au db) = gavu gb. 


A necessary condition can be stated at once: every sublattice of Pe is 
distributive, since the distributive law holds for all elements of Pe and 
thus also for the elements that form part of a subset of Pe (cf. IB10, §2.3). 
Consequently, V must be distributive. This necessary condition is also 
sufficient, although the proof requires methods that we do not develop 
here. However, our present methods enable us to prove the theorem 
in the special case that V is of finite descending length, so that every 
element is uniquely decomposable into primary elements. 

As the desired set e it is natural to consider the set of primary elements 
of V, which we denote by Q. It is also natural to assign to every element 
a of V the set ga of primary elements p < a: 


peEgacopeQapca 


The set ga is not empty, since a is decomposable into primary elements. 
The proof of (H 1) is very simple: 


pegp(and)op <anbeopcaanapch, ie, pegarpe gb. 
For the proof of (H2) we require lemma 1: 


pep(aub)op<caub—pKcavp<b, ie, pEegav peg. 
(by lemma 1) 


In the opposite direction we have at once the following result: 
pEegavghbopcavp<b—>pc<caubd, ie, pEeg(au bd). 


Thus the set of images pV forms a sublattice of PQ. We assert that 
the mapping of V onto pV is one-to-one and is thus an isomorphism, 
and in fact we obtain the inverse mapping by assigning to ya the element 
Upnega p- We must then prove: 


Up=a. 
a) For all p € pa we have p <a, so that U,zp < 


® Hermes [1], page 106 ff. 
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b) The element a can be represented as the join of primary elements: 


k 
a= pus u pe= U pr. 
Here every p, < a, so that p, € ya, and therefore a = U*_, p, < Ugap. 
The assertion then follows from a) and b). 

Thus V has been mapped isomorphically onto a sublattice of PQ. 
Consequently we have the theorem: every distributive lattice of finite 
descending length is isomorphic to a set-lattice: or as we may also say, 
can be represented as a Set-lattice. 

In a certain sense we have thus made a survey of all possible distributive 
lattices of finite descending length. 

From the above theorem we have: every distributive lattice of finite 
descending length is isomorphic to a sublattice of a Boolean lattice, or in 
other words: can be imbedded in a Boolean lattice. 


5. Modular Lattices 


5.1. In the lattice Le of linear subspaces of the (three-dimensional) 
projective space the distributive law does not hold; for example, 
an(buc)>(anb)u (ano), if a is a plane and b,c are two points 
not on the plane. But if c lies in the plane a, then an (buc) = c, 
(an b)u(anc) =nuc=c. Thus in this case the distributive law is 
satisfied under the additional assumption c < a. The law in this weakened 
form 


(10) c<a->an(buc)=(and)uc (note that anc =) 


is called the modular identity, and a lattice whose elements satisfy the 
modular identity is said to be a modular lattice. 

The modular identity is self-dual, which means that if n is exchanged 
with u and < with > the result is the same as before 
(with interchange of the letters a and c, which is of e 
no importance). 

The modular identity does not hold in every lattice, ¢ 
as is seen from the example (m), Figure 6. d 
However, in every lattice we do have the inequality 


(10’) c<a>an(buc)>(anbdjuc, 


c 


n 


so that for the proof of (10) we need only show that Fig. 6. (1m) 


(10”) c<a-an(buc) <(anb)ue. 
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5.2. An important class of modular lattices is characterized by the 
following theorem. 


Theorem. The normal subgroups of a group form a modular lattice 
with the ordering <: “‘subgroup of.”’ 

For the proof we must determine the meaning of an 6 and aud in 
this case. 

If a, b are normal subgroups of e, then aN bd is also a normal subgroup 
of e and is thus the largest subgroup of e contained in a and b; or in other 
words: 


anb=anb. 


As for a u 8, which is the smallest normal subgroup of e that contains 
a and 5, it consists exactly of those elements € of e that can be represented 
in the form 


E = af, aE A, Beb 


(see Part B, Chapter 2, §6.4). 
For the proof of the theorem we must now prove (10”), or in other words 


c<a>[Eecan(buc)>ée(and)ucl. 
But €€an (6 u c) means that there exist «€ a, Be b, y Ec such that 
é= (e4 = By. 


Since c <a, we have yea, so that 8 = ay~1e€a, or in other words 
Bean b. Thus € is represented’ as the product of an element Bean b 
and an element y €c, so that €e€(an b)uc. 


A commutative group is also called a module. By the above theorem the 
lattice of submodules of a module is a modular lattice, which explains the 
choice of the word “‘modular’’. 


5.3. The example of the nonmodular lattice (m) 
is characteristic in the following sense. 


Theorem. Every nonmodular lattice contains 
a sublattice (m). 


an(buc) 


Proof. If M is nonmodular, it follows from 
(10’) that M contains at least three elements 
a,b,c with c <a and (an b)uc<an(buc). 
Thus M also contains the following necessarily 
distinct elements: an b, aub, cnb, cud, 
an (buc),(an bd) uc. Their order relations are 
Fig. 7 represented in Figure 7. This order diagram 


(and) uc 
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provides us with a sublattice T of type (m), so that it only remains to 
prove that if two elements of T were equal, we would havea n (buc) < 
(an 6) uc. From symmetry we need consider only the following cases: 


a) an(buc)=b uc. Then it follows that 
an(buc)nb=(buc)né, 
and thus by the absorption law 
anb=6, (an b)uc=buc=an(buc). 


b) buc= b. Then it follows that 1. an (buc)=anb; 2.c<b; 
and from c < a, we have c < an b and therefore (an b)uc=anb. 


c) an (buc) = b. Then it follows that 
(anb)uc= fan(an(buc)Juc= fan(uc)luclan(buec). 


5.4. By §5.3 a modular lattice is characterized by the fact that it con- 
tains no sublattice of type (m). This fact can be interpreted in the following 
way. If in a modular lattice two elements a, b have an upper neighbor in 
common (which is then of course a u b), they also have a lower neighbor 
in common (an 5). 


Proof. If c is an element between au b and a:anb<c <a, then 
we assert thatc =anborc=a. 


We form cub. From the definition of u and c < a it follows that 
b<cub<aub, so that, since au b is an upper neighbor of 5, 
we have only the two possibilities: first c u b = b; 
then c <b, and since also c <a, therefore c << an b; 
and on the other hand, we had an b <c¢, so that 
c=: an b; second c:.1b =aub; then it follows 
from(10), under the additional assumptionan b <c, 
that an (6uc) = (an 6) uc =e, so that 
c=an(aubd)—a. 

Definition. A lattice in which every two elements 
that have a common upper neighbor also have a 
common lower neighbor is said to be semimodular 
below. The concept of semimodular above is defined dually. 

Every modular lattice is semimodular below and above. Figure 4 
(page 492) shows a lattice that is semimodular below but not above, 
so that it is not modular. If we omit the twice-circled points we obtain 
a sublattice (m). If a lattice is semimodular below and above, we cannot 
at once conclude that it is also modular. For example, it may happen 
that no element has a neighbor, in which case the lattice is to be considered 


Fig. 8 
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as semimodular, although trivially so. The additional condition ‘“‘of finite 
length” is sufficient, but we do not give the proof here. 


5.5. The theorem on semimodularity can be extended to larger 
complexes of elements by the chain theorem of Dedekind, as follows. 
Definition. A proper chain between a and b 


A= Xy <x <i X= OD 


is called a maximal chain if it cannot be properly refined, i.e., if 
X¢ <Y < Xz, implies either y = x; or y = X44, . 

The chain theorem states: if in a lattice that is semimodular either above 
or below there exists a maximal chain of length | joining a and b, then every 
maximal chain between a and b is of length I. 

We give here an outline of the proof (see Figure 9). Let 


A=X<xXy<o <x, = +b 
and 
a= Vy << << Ym = b 


be two maximal chains between a and b. The assertion is that m = /. 
For / = 1 this assertion is correct, and 
b also for / = 2 on account of the semimodu- 
larity. We now argue by complete induction. 
<2 Yon If x;_1 = Ym-1, the proof is clear. Otherwise 
the two elements have the common upper 
neighbor 5b, and thus, if we assume semi- 
Ym Modularity below, they have the common 
lower neighbor z,, (which may coincide 
with x,;. or with y,., or with both of 
them). In any case the maximal chain 
ax < Xp. <x, is Of length /— 1, 
so that by the induction hypothesis the 
maximal chain a < *** < 2Z;_. < x;_; is of 
the same length, and thus a < - < z,_, is 

of length / — 2. But then 
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a<t <2). <Ym-1, andtherefore a < -** < Vmio < Vm-1; 


is also of length / — 1. 

The chain theorem stands in close analogy with the Jordan-Hdélder 
theorem (IB2, §12), but for the latter theorem we have the peculiar 
difficulty that it is based on the relation “normal subgroup of,” which 
is not transitive and thus is not an ordering. For details we refer again to 
Hermes [1]. 
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5.6. For lattices of finite length the chain theorem allows us to 
introduce the concept of dimension: to every element a we may assign 
as its dimension da the length of a maximal chain from n to a. (The 
reader will see that there must exist at least one such maximal chain.) 
In the lattice Le the geometric dimension is given by 5a — 1. In arguments 
involving this concept an important role is played by the following 
dimensional equation 


5(a u b) + 8(an b) = ba + db, 


which is a consequence of the isomorphism theorem: in a modular lattice 
the interval a u b/a is isomorphic to the interval b/an b. 

Let us give the main part of the proof of this theorem. 

To every element x in a u b/a the mapping ¢ with px = x n 5b assigns 
an element in b/an b, since a < x < au b implies 


anb<xxnb<(aubnb=b. 


This mapping is one-to-one. Every element y in b/an b is the image of 
exactly one element in au b/a, namely y ua. So we now prove the 
following three statements: 


Nax<yua<aub; 
the first part is clear, and the second follows from y < b. 

2) p(y ua) = 9; 
by the modular identity, we have p(y ua) = (yua)nb=yu(an Bb), 
which is equal to y, sinceeanb < y. 


3) If x, z are elements of a u b/a and if px = gz, then x = 2; 
but from x n 6 = zn bit follows that (x n 6) u a = (zn b) wa, so that 
by the modular identity we have xn (bu a) = zn (bua), and thus 
x = z, since both are < bua. 


The remainder of the proof of the isomorphism theorem is left to the 
reader, as well as the proof of the dimensional equation, which is now 
easy. 

By means of the isomorphism theorem it can also be shown that two 
finite chains between the same elements have isomorphic refinements. 

It must be emphasized that a rigorous proof of the theorems in §§5.5 
and 5.6, for which we have only given outlines, depends upon a con- 
siderable number of details left unmentioned here. 


6. Projective Geometry 


In a k-dimensional projective space e the lattice Le of the linear sub- 
spaces is modular, complemented, and of finite length; in fact, the length 
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of the chains is even bounded, namely, < k + 1. Thus Le is also atomic. 
The atoms are the points. : 

Conversely, let 4 be a modular complemented lattice of finite length 
which is not empty and does not consist of the zero element n alone. 
We assert that it can be interpreted as the lattice of the linear subspaces 
of a projective space. 

Since M is of finite length, there exist upper neighbors of zero, which 
we Shall call points. Then it is possible that MW consists of n and one point p. 
But if MM contains other elements (at least one), we assert that there exist 
at least two points. For p must have at least one complement p’. From p’ 
there is a maximal chain leading to n; this chain contains a point q, and 
if we had g = p, it would follow that pn p’ = pn. 

Two distinct points p, g have n as common lower neighbor, and thus 
they have a common upper neighbor g = p u qg. Consequently there 
exist elements (at least one) of dimension 2; we call them /ines. Through 
two points there passes at least one line. 

Let h be a second line through p, q 
; (p <h,q <h). Then from the definition 
of u it follows that g <h. But dg = dh, 
so that g = h. Thus we have 
i (P1) Through two points there passes 
exactly one line. 
We further assert: 
(P2) If a, b, c, d are distinct points and 
e : t if the linesaub and cud have exactly 
one point (s) in common, then the lines 
Fig. 10 aucandb ud havea point (t) in common 
(Figure 10). 
For the proof we use the dimensional equation: 


S(auc)n(bud)) = d(auc)+ d(6ud)— d(aucubud) 
= 4 —d8(aucubud). 
On the other hand, 
S(aubucud) = d(au bd) + d(c ud) — (au db)n (cud) 
= 4 —1=3, 


since we have assumed the existence of a point of intersection of au b 
and c u d. Thus 8((a uc) n (6 u d)) = 1, as was to be proved. 

If the four points are not all distinct, the theorem must be formulated 
somewhat differently, as follows: if the five points a, b, c, d,s have the 
property that a, b, s are collinear (i.e., lie on one line) and c, d, s are also 
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collinear, then there exists a point ¢ for which a,c, t are collinear and 
also b, d, t. In the proof it is necessary to discuss the various cases. 
Finally we show: 
(P3) Ona line g there lie at least two distinct points. 
In the first place there certainly exists at least one point p on g. A second 
point is provided by a complement p’ of p. Naturally we will expect that 
p' ng = qisa second point on g. We have g < g. Furthermore, 


puq=pu(p'ng) 


is equal, by the modular identity, to 
gn(pup’) =g. 


Thus g ~n and q ~p,, since g ~ p; and also q ~ g, since otherwise 
we would have p < q < p’. Consequently, q is a point on g distinct from p. 

In geometry we usually require a sharpening of (P3) which is not a 
consequence of (P1)—(P3), namely: on every line there lie at least three 
distinct points. In lattice-theoretic language this axiom corresponds to 
the property that the lattice 4 cannot be represented as a direct product. 

In this way statements about modular complemented lattices of finite 
length are translated into incidence axioms of projective geometry. 
Further details of the lattice-theoretic interpretation of geometry, including 
geometries of infinite dimension, cannot be given here. 


Let us close with a remark about the significance of lattice theory. 
It is the task of the theory to formulate and prove general statements 
in the generality appropriate to them, without any unnecessary special 
assumptions. Thus the chain theorem does not depend on whether we 
are dealing with normal subgroups of a group, with the factors of a number 
or with other objects; it depends only on the existence of an ordering 
and of n and u, and on the modular identity and the concept of finite 
length. 


CHAPTER 10 


Some Basic Concepts for a Theory of Structure 


Introduction 


1. The theory of lattices, as described in the preceding chapter, is 
one of the technical means at our disposal for giving the full appropriate 
generality to statements of general import in mathematics. But if we 
wish to approach this problem systematically, we must first construct 
and investigate the necessary general concepts. We must not base our 
study on any special branch of mathematics, even though well-known 
mathematical facts will serve as guidelines for the construction of concepts; 
however, it will be necessary to make use of logic, and in fact the theory 
of structure is exactly the place where the importance of mathematical 
logic for mathematics as a whole is most clearly seen. 


2. Let us begin with the example of a group, i.e., of a set in which 
there is defined an operation (or composition, as we shall call it in the 
more general setting below) satisfying certain rules, known as the axioms 
of the group. Here the particular concrete group is defined by its actual 
elements, together with the operation. In this sense we speak of a con- 
figuration (or mathematical system; in German, Gebilde). But abstractly 
considered, it is characterized as a group by the existence of an operation 
and by the axioms, a characterization which has nothing whatever to do 
with the special set of elements defining the concrete group. In this sense 
we speak of structure. We now wish to describe these concepts in the 
greatest possible generality. Our discussion will be based on the work 
of P. Lorenzen [1]. 
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1. Configurations 


1.1. Definition 


A set M for whose elements (denoted by lower-case italic letters) there 
is defined a finite sequence P of relations R,,..., R, (upper-case italic 
letters, with or without indices) is called a configuration (M, P). 


Note. This terminology has nothing to do with the concept of an “‘analytic 
configuration” in the theory of functions of a complex variable. 


Examples. A set of points with the three-place relation: “the points 
x, y,z are collinear.” The set of natural numbers with the two-place 
relation: “‘x is smaller than y.” The same set with the three-place relation: 
““z is the sum of x and y.”’ 

One may ask how a relation can be “‘given’’ in a concrete case. Here are 
two examples. 1) In a finite set a multiplication, for instance, can be 
defined by setting down the product of any two elements in a table; 
and similarly, for any two-place relation in a finite set we can write down 
the pairs of elements for which the relation holds and the pairs for which 
it does not hold. 2) For the natural numbers addition and multiplication, 
for instance, are defined recursively (see IB1, §1, and IA, §10). 


1.2. The following kinds of relation are of particular importance: 
1) Correspondences between elements of two sets N, , Ng : 
R(x, , X_) cannot hold unless x, € MN, , x, € Ny. 


In the sense of our definition we must take M2 N, UN, if we are to 
interpret the whole system as a configuration, which is not always desirable. 
The correspondence is called a mapping (also a function; see IA, §8.4) if 
it is unique with respect to the second element, i.e., if 


(U) R(x, , X2) A R(X, Xg) > X_ = XQ. 


In general we speak of a mapping from N, into N, . If every element of 
N, has an image, we speak of a mapping of N, , and if every element of 
N, is the image of an element of N,, we speak of a mapping onto N, . 
In the present chapter we shall usually denote mappings by lower-case 
Greek letters, and the existence of a correspondence ¢ between x, and x, 
will then be written in the form px, = x, (cf. also IA, §8.4). 

We note that by definition a mapping is always one-valued, so that 
there will be no need to mention this fact from now on. Thus for a mapping 
—p we always have 


AgA (x = Y > ox = 9). 
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Of course, it can happen that px = py and x $ ». If this situation does 
not occur, Le., if 


A,A,(yx = vy > x = y), 


we see that the passage from px to x is also a mapping. As the inverse 
mapping of » we denote it by p— and say that in this case is invertible, 
or one-to-one. 


2) An (2+ 1)-place relation is called an n-place inner composition 
(German Verkniipfung, French composition) if for every n-tuple x, , ..., x, 
of elements in M there exists exactly one element z in M for which 
R(x, ..., X, , 2) holds, ie., if 


(E) Ko Ba Ng R Xi is Xa) and 
(CU) Ag Ng Ay Ay (RG yes Xa Z) A RG cry Xp 2) ee SZ’). 


In other words: a composition is a mapping of M x Mx ++: x M=M* 
into M. Thus we may write z = p(x,,..., X,) instead of R(x,,..., Xn, Z): 

For example, addition and multiplication are two-place compositions. 
As a general symbol for a two-place inner composition it is customary 
to adopt the Bourbaki symbol 1, and to write this symbol between the 
arguments to which it applies: x Ty = z. 


3) According to this definition the formation of the least upper bound, 
or of the greatest lower bound, in a complete lattice is not a composition; 
for instead of the n-tuple x, , ..., x, we are dealing here with an arbitrary 
subset of M, and the mapping in question is from the power set PM 
into M. But for the purposes of the present chapter we shall include 
mapping of PM into M among the compositions, calling it a nonelementary 
composition (since formation of the power set is not part of elementary 
logic). 

4) Another extension is necessary if we wish to include, for example, 
the S-multiplication in vector spaces (multiplication of a vector with a 
scalar) (IB3, §1.2). In addition to the set M (the vector space) we now 
have a domain of scalars S (in general we may speak of it as a domain 
of operators ©) and to each pair (w, x), we Q, xe M there is assigned 
an element z of M@. Under the relevant assumptions (E) and (U) a corre- 
spondence of this sort is called an outer operation, and for it we use 
the symbol 1: w 4 x = z, again following Bourbaki. In place of x we could 
also have an n-tuple or a subset of M. The domain Q may also coincide 
with M. 


5) Finally, for all these compositions we could allow the set of images 
to be not M but some other set, as is the case, for example, with the tensor 
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product and the outer product of vectors (IB3, §3.3). But in the present 
chapter we shall exclude this generalization of the concept of a com- 
position. 

Definition. If all the defining relations of a configuration are composi- 
tions, the configuration is called a composition-configuration or an abstract 
algebra (IA, §8.5). 

Configurations of other kinds are, for example, the ordered sets (defined 
by an ordering), and the topological spaces, defined by a relation “‘U is a 
neighborhood of x’’ between an element x of X and an element U of a 
subset U of PX (M = X VU NU, cf. 1.2, §1); in this case Y may, for example, 
be the set of points of a plane, where U is the set of open circular disks. 
It is remarkable that a dual group (IB9, §1) is a composition-configura- 
tion, whereas a lattice is not. 


1.3. Homomorphism and Isomorphism 


Some of the most usual (algebraic) concepts have to do with configura- 
tions, and others with the notion of structure defined below. In particular 
the concepts of homomorphism, isomorphism, and congruence have to do 
with configurations. The first two of these deal with mappings of a con- 
figuration (M, P) into another configuration (’, P’). Here it is not 
only the elements of M that are mapped but also the defining relations. 
For the latter it is assumed that every relation R,; in P corresponds to 
exactly one relation R; in P’ and conversely. We then say that the con- 
figurations are homologous. For the most part we shall be interested only 
in the subset of M that actually undergoes a mapping; in other words, 
we usually deal with a mapping of (, P) into (M’, P’). 

A mapping 9 of (M, P) into (M’, P’) is called a homomorphism if for 
every relation 


R(X , +) Xn) > R' (pry , +5 PXn)- 
For a two-place inner composition T this condition means that 


XTY=Z> XT py = PZ 
or px T py = P(xTy). 


For an outer composition we must take account of the domain of 
operators, which is done in the same way as for the set of relations: 
we assume that there exists a one-to-one correspondence between the 
elements of Q and those of Q’, or we may at once assume that the two 
configurations have the same operator domains. The condition for ¢ to 
be a homomorphism is, in the simplest case: 


p(wix) = wi’ yx. 
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Definition. If p is a one-to-one mapping of (M, P) onto (M’, P’) such 

that its inverse is also a homomorphism, then 9g is called an isomorphism. 

This last condition (namely that the inverse 

e' mapping must also be a homomorphism) does 

not follow, in general, from the other condi- 

a b tions, as may be shown by the example of the 

2’ mapping of an ordered set with the order 

diagram I (Figure 1) onto the set with the order 

diagram II. But for compositions the latter 

condition can be deduced from the others, since 

if we are given elements x’, y’, there must 
exist x, y with x’ = px, y’ = yy, and thus we have 


Fig. 1 


p(x’ Ty’) = pu l(px T ey) 
= op (p(x T y)) 
= xT y= px’ tp ly’ 


Definition. A homomorphism of M into M itself is called an endo- 
morphism, and an isomorphism of M onto M itself is called an auto- 
morphism. 


1.4. The Automorphism Group 


Successive application of two automorphisms produces a uniquely 
determined automorphism and is thus a composition. It is easy to see 
that the automorphisms of a configuration form a group under this 
composition, the automorphism group of the configuration. 

Conversely, we have the theorem: For every group G there exists a 
configuration (M, P), and in fact a composition-configuration, whose 
automorphism group is isomorphic to G. 

Birkhoff’s proof.1 Let the elements of G be denoted by lower-case 
Latin letters. The set will consist of the elements of G and the pairs 
of elements of G: 


M=GU(G x G). 


Let the defining relations be: an outer composition with G as operator 
domain, defined by 


clLa~, ci (a,b) =a, 


1Qn the structure of abstract algebras. Proc. Cambridge Phil. Soc. 31, 1935, 
pp. 434-454. 
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and an inner composition T, defined by 
(a, b) tT(a’, b’) = b+ B'>, 
ata =at(a,b’) =(a,5)Tb = 1, 


or in words: when T is applied to two pairs of elements, it produces the 
quotient of the two second elements, and otherwise Tt always produces 1 
(the neutral element of G). 

Assertion. The automorphism group of the configuration defined in 
this way is isomorphic to G. For the proof we must put every auto- 
morphism g into one-to-one correspondence with an element of G in 
such a way that the correspondence is an isomorphism. To this end 
we examine the effect of an automorphism ¢ on the elements of M. 


1) If ceG, then for every automorphism gp we have: yc = c. (G is 
elementwise fixed under every automorphism.) 


Proof. From cia = a it follows that 


gc = p(cia)=ciga=c, if ga=a' eG; 
(a’, b’)EG xX G. 


I 


=a’, if ga 


In each case we have gc € G; since this statement holds for every element 
of G, the second case does not arise. 

2) p(a, b) = (a’, b’), or in words: the image of a pair is again a pair. 
For if we had (a, b) = c, then for the inverse mapping we would have 
yc = (a, b), in contradiction to 1). 

More precisely we have: (a, b) = (a, b’); i.e., an automorphism leaves 
the first element of a pair unchanged. 


Proof. Let p(a, b) = (@’, b’). 
From c. (a, b) = aand ga = a it follows that 


= ga = p(ci(a, bd) =—cigad=cid@,b)=d. 


3) By p(l, 1) = C1, c) an element c in Gis assigned to each automorphism. 
We must show that conversely the automorphism is uniquely defined 
by c. For this purpose we need only express the b’ in g(a, b) = (a, b’) 
in terms of a, b, and c. 

From (a, 6) T (1, 1) = b it follows that 


= pb = 9(a,b)T (1, 1) = Gb’) TI, ce) = Bc, 


so that b’ = be. 
Up to now we do not know whether any automorphism of our con- 
figuration actually exists. We have only shown: if » is an automorphism, 
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then gp determines an element ceEG by the equation (1, 1) = (1, c), 
and for arbitrary a, b 

(*) ga = a, g(a, b) = (a, be). 

But it is easy to verify that: 

1) For every ce G the mapping of M into itself defined by (*) is an 
automorphism. 

2) The correspondence between 9 and c is one-to-one. 


3) If the group elements c, , c, correspond to two given automorphisms, 
then the group element c, , c, corresponds to the automorphism deter- 
mined by successive application of the given automorphisms. 


1.5. Congruence Relations 


A congruence relation = is an equivalence relation that is consistent 
with the defining relations R of the configuration; that is, for a con- 
gruence relation we have 


R(x, 9899 Xn41) A y= x4 Aw A Xn+1 = Xn41 — R(x 9 9889) Xayi): 
If R is an inner composition, so that R(x,, ..., x, ,Z) can be written in 
the form 
is p(x a tte9 Xn), 


then the condition for consistency can be written: 
Xy = XY NA X= Xp PM 5 ey Xn) = PCT, 0+, Xn), 
and in the case of an outer two-place composition: 
X=SyYoowilix=wily. 


A congruence relation gives rise, as an equivalence relation, to a 
partition into classes. Let the class defined by x be denoted by «x. From 
the set «M of classes we construct a configuration homologous to (M, P) 
by defining 

R(eXy , 0005 KXn) A R(X , «5 Xn) 


or for compositions p(Kx,, ..., KXn) = K(p(X1, ..., Xn). 

It is possible to make these definitions only because the right-hand 
sides depend, in view of the condition for consistency, only on the classes 
and not on our choice of representatives; that is, if we choose other 
representatives from the same classes, we obtain the same results. It is 
easy to see that the mapping x into «x is a homomorphism. Thus every 
congruence relation corresponds to a homomorphism. 
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For a composition-configuration we can prove the converse. To every 
homomorphism 9 of (M, P) into the configuration (M', P) there corresponds a 
congruence relation, namely, the relation = defined by a = b<> ga = ob. 
For we need only prove that such a relation is reflexive, transitive, and 
symmetric and is consistent with the compositions. As an example, 
we will demonstrate the consistency for the case of a two-place inner 
composition: 

Hypothesis: x =x’, ie, ox = px’; y=y,, Le, py = gy’. 
Assertion: xTy=x'Ty’, te, p(xTy) = p(x’ Ty’). 


Proof: p(x Ty) = pxT py = px’ Toy’ = p(x’ TY’). 

We have seen that to every congruence relation for groups there corre- 
sponds a normal subgroup, and for rings an ideal; the corresponding 
question for configurations, namely, whether to every congruence relation 
there corresponds a subconfiguration, is a question of an entirely different 
sort, to be answered affirmatively only for certain very special configura- 
tions. 


2. Structure 


2.1. Definition 


A set is made into a configuration by means of defining relations; 
and we then say that the set carries a structure. Our present purpose is 
to give a definition of structure. Certainly this concept must be independent 
of the particular set under consideration, much in the same way as the 
concept of an architectural “style” is independent of any particular 
edifice erected in the given style. The concept of “structure” will refer 
to the properties of the defining relations. 

For example, let us consider a defining relation R which is three-place 
and has the following four properties: 


(1) A AY RO, y,2), 
Q) AAA (RO, Y.2) 4 RO yz) 2 = 21); 


x y 
so that R is a composition. We write xy = z for R(x, y, z). 
(3) AA A (ay) 2 = x(2)), 
(4) VA (ex = x A V Ex = e). 


(In other words, R defines the concept of a group, cf. IB2, §1.1.) 
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These formulas are constructed from individual variables and the 
relation symbols (of which there is here only one) with the logical particles 
A, V, > and the quantifiers A, V, where the quantifiers are applied only 
to the individual variables, not to the relation symbols, and all the 
individual variables are bound by the quantifiers. These formulas become 
statements when the individual variables are replaced by the elements 
of a set M and the relation symbols by the defining relations of a con- 
figuration. Such a system of formulas is called an axiom system. (In this 
choice of terminology there is no reference to the philosophical meaning 
of the word ‘“‘axiom” or to the question of self-evidence.) 

If the formulas of an axiom system 2 become valid statements when 
a configuration (M, P) is inserted in the way just described, the con- 
figuration (M, P) is said to be a model of the axiom system Z. We say 
that (M, P) has a structure, which is described by &. 

If an axiom system Z’ is logically equivalent to an axiom system £ 
(that is, if 2’ follows from X and conversely) we say that £’ describes 
the same structure as Z, and thus we define a structure as a class of logically 
equivalent axiom systems. 

In general, a configuration will carry various structures; for example, 
a configuration of the rational numbers with the compositions of addition 
and multiplication carries the structures of a ring, an integral domain, 
and a field. One might be tempted to try to find a “comprehensive” 
structure, from which would follow all valid statements about this 
configuration. But it is a consequence of the incompleteness theorem of 
Gédel (IA, §10.5) that for the configuration of the rational numbers 
there cannot exist an axiom system of this sort that is finite or recursively 
enumerable. 

Another kind of similarity between structures is illustrated by the 
concepts of lattice and dual group (IB9, §1). Here we have two types of 
defining relations, on the one hand the order <, and on the other the 
compositions n, u. These relations can be defined in terms of each other, 
as follows. 


On the one hand: a <beanb=a, 


on the other: d=anbod<and<ba 
Ag(d' <aad' <b)>d <d). 


With these definitions the two systems of axioms are seen to be equivalent. 
Every model of either of them is also a model of the other. In the present 
chapter we cannot enter upon a precise general description of this situation. 
Two structures that are similar in this way are said to belong to the same 
Structure type. 
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2.2. Subconfigurations 


In our discussion of lattice theory (IB9, §1, II) we at first used the two 
names ‘‘dual group’”’ and “‘lattice’ to distinguish the two structures 
defined in different ways, namely, the dual group by the two (dual) 
compositions and the lattice by the order relation, and it is desirable 
to have these two distinct names. Then, as is customary, we have used 
the name “‘lattice’’ for the structure type to which both the structures 
belong. 

But now the subsets which under the given compositions form a dual 
group (they are usually called sublattices) are not always the same as 
the subsets (called subbands) that form a lattice (in the sense of the 
structure) with respect to the given order. Thus we must define the concept 
“subconfiguration” in terms of the structure and not in terms of the 
structure type. 


Definition. Let (M,P) be a configuration carrying the structure %. 
Assume that under the same relations as for M and under outer composi- 
tions with the same operator domains a given subset N satisfies the axiom 
system £; then (N,P) is said to be a subconfiguration of (M,P) with 
respect to X. 


Remarks. 1) The phrase “with respect to £” is omitted if no misunder- 
standing can arise. 

2) The conditions (E), (U) that characterize a relation as a composition, 
belong to Z. If the condition (U) is satisfied in M for a given relation R, 
it is also satisfied in any subset N of M. If the condition (E) is satisfied 
in N with respect to R, then N is said to be closed with respect to the 
composition R; e.g., for a two-place inner composition tT the subset N 
is closed if a,be N implies atbe N. For a composition-configuration 
the closedness of a subset N is a necessary but not sufficient condition 
for N to be a subconfiguration; for example, the integers with addition 
as the composition form a group for which the set of positive integers 
is closed with respect to addition but is not a subgroup. 


3) Every two-place inner composition in a set M can be regarded as 
an outer composition with M as the operator domain, and this may be 
done in two ways, depending on whether we regard the left or the right 
factor as an operator. Corresponding possibilities exist for many-place 
compositions, which we need not discuss here. If we interpret multiplica- 
tion in a ring as an outer composition in this sense, then the subconfigura- 
tions of this structure are not the subrings but the left and right ideals, 
respectively (cf. IB6, §2.5). 

4) In a group (G, .) the forming of inner automorphisms can be regarded 
as an outer composition with G as operator domain: gix = g-ixg. 
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In this case, and more generally for any set of operators that give rise 
to endomorphisms, we speak of a group with operators. The subcon- 
figurations are called admissible subgroups. In the present case these are 
exactly the normal subgroups (IB2, §6.1). 


Vector spaces are also groups with operators. Here the group composition 
is addition; the S-multiplication gives rise to endomorphisms (cf. IB3, §1.2); the 
admissible subgroups are the vector subspaces. Since the theorem in IB9, §5.2 
also holds for admissible subgroups of a group with operators, and since in a 
commutative group every subgroup is a normal subgroup, it follows that the 
lattice of vector subspaces of a vector space is modular. 


The importance of the remarks 3) and 4) lies in the fact that they 
illustrate the great generality of the theorem of §2.3 (that the subcon- 
figurations... form a complete lattice). 

In each case where we have used the names “group,” “ring,” “‘field’’ 
the reader should consider whether we have been referring to a structure 
or to a structure type. The answer may be different from case to case, 
and it appears that up to the present no one has given an interpretation 
of the situation that will command universal assent. 


2.3. The Lattice of Subconfigurations 


In the set U(G) of subconfigurations of a configuration (G, P) an order 
is defined by inclusion. If the intersection of arbitrarily many subcon- 
figurations is again a subconfiguration, it follows from the theorem 
on the least upper bound (IB9, §2.1) that U is a complete lattice. A structure 
which is “‘bequeathed”’ by any set of subconfigurations to their meet 
is said to be meet-hereditary, and we then have the following theorem. 


Theorem. The subconfigurations of a configuration with meet-hereditary 
structure form a complete lattice. 


We now ask: which structures, or in other words which sets of axioms, 
are meet-hereditary? Here the answer depends on the logical form of 
the axioms. For example, the following are meet-hereditary: 


1) Axioms in which only A-quantifiers occur; for if such an axiom holds 
for G, then it holds for every subset of G. 

Here it must be assumed that the axiom has already been brought 
into the so-called prenex normal form, in which all the quantifiers appear 
in non-negative form at the beginning of the formula and apply throughout 
up to the end of the formula, a situation that can always be attained by 
logical transformations. (Otherwise we could always arrange to have 
only A-quantifiers by replacing V,A(x) with — A, — A(x). 
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2) Axioms in whose prenex normal form the quantifiers appear in the 
successive order 


(E’) A V A A(x, y, Z), A(x, y, Z) free of quantifiers, 
only if the uniqueness statement 

(U’) Ave Ay, Ay, A, [A(x, V1» 2) A A(x, Yo,7> y= Ye] 
is valid, i.e., is either an axiom or a consequence of the axioms. 


Thus we are asserting that if N,, N. are subsets of M with N,N, = 
and if (EZ’) holds when the domains of the variables are restricted to N, or to Nz, 
then (E’) also holds when the domains of variables are restricted to D, i.e., 


A V A A(x, y, 2). 


wveD yvED reD 
Proof. If we assume x € D, we have 
VA A(x, y1, 2) and VA A(x, ¥¢, 2) 


vyEN, 2EN, VQENq 8ENg 
If for z we choose an element in D, it follows that 
A(x, Mrs Z) A A(x, v5 Z), 
so that (U’) implies y, = yzE MON. 

The proof is also valid if the x, y, z are replaced by systems x,,..., Xp; 
Yrs cor Vas bry ves le. 

The proof becomes simpler if z and the corresponding quantifier do not occur. 
For then in place of (E’), (U’) we have exactly (E), (U) as on page 510, and thus 
we obtain: closedness with respect to operations is meet-hereditary. 

If x and the corresponding quantifier do not occur, we must take account 
of the possibility that D is empty, and then the proof runs somewhat differently. 
From (E’) and (U’) we have: there exists exactly one y in M with the property 
Aicem A(y, Zz). The restriction of (E’)) to N,, (¢ =1, 2), means that 
Ven, Azan, A(y;, Zz). Thus N; is not empty; consequently there exists a z,; in N, 
with A(y, z,) and A(y,, z,). So by (U’) we have y = y, for y = 1, 2, and thus 
y € D, which completes the proof of (E’) for D. This result shows, for example, 
that the existence of the neutral element of a group is meet-hereditary. 


Groups, rings, fields, and lattices are examples of meet-hereditary 
structures. So it is natural to ask what properties of a configuration 
correspond to given properties of the lattice of its subconfigurations. 
For the particular case of groups, this question has been the subject of 
many profound investigations, of which M. Suzuki has recently given a 
connected account.? 

The above theorem admits the following converse: if V is a complete 
lattice, there exists a configuration with meet-hereditary structure whose 
subconfigurations form a lattice isomorphic to V. 


2 Structure of a group and the structure of its lattice of subgroups. Ergebnisse d. 
Math., Neue Folge, Heft 10, 1956. 
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Birkhoff’s proof.?’ We take M = V and define an outer composition, 
with V as operator domain, which to an element a of V(= Q) and a 
subset BC V assigns an element x € V: 


atB=anUb. 
B 


Since V is a complete lattice, this correspondence is actually a composition, 
i.e., the conditions (E) and (VU) are satisfied for the corresponding three- 
place relation. We now take Z to consist of these axioms alone. Then 
the configuration (M, 1) has a meet-hereditary structure, so that its 
subconfigurations form a complete lattice. We assert that this lattice is 
isomorphic to V. The proof runs as follows. 

The subset NC M is a subconfiguration if 


aeVABOCN->anUoben. 


We consider Uy y = c. Since V is complete, there exists an element c in V. 
In fact, c even belongs to N, sincec =cnc=cnUyy, and NECN. 
By the definition of the least upper bound, x e N—» x < c; and conversely, 
x <c—>xeN since 


=xne=xnby. 
eA 


Consequently: for every subconfiguration N there exists an element c 
with the property that WN consists exactly of the elements x < c. This set 
A, is called the segment of c. 

Conversely, every segment A, is also a subconfiguration. For if BC A,, 
we have: be B->+b <c, so that for every ae V: 


anllbxe, 1€., € A,. 


Obviously there exists a one-to-one correspondence between the 
segments A, and the elements c of V such that 


A,GAqerc < d. 


It follows that the segments, and consequently also the subconfigurations, 
form a lattice isomorphic to V, so that the proof is complete. 

In the present chapter we have not been able to give more than the 
first steps in a theory of structure. In the theorems on the automorphism 
group and the lattice of subconfigurations we have tried to prove some 
of the first results. They illustrate the importance of the concepts of 
group and lattice. 


“On the Combinations of Subalgebras,” Proc. Cambridge Phil. Soc. 29, 1933, 
pp. 441-464. 
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Let us mention some other questions without attempting to answer 
them here. 

How can other configurations be constructed from a given configura- 
tion? For example, how should subconfigurations be constructed; or 
direct products? (On the same question for lattices see IB9, §2.3.) Which 
of the properties of a configuration are preserved in passing to a sub- 
configuration or a direct product; or to a homomorphic configuration? 

How can we classify systems of axioms, i.e., structures, on the basis 
of the parts of logic that are employed? For example, Lorenzen distin- 
guishes pure-elementary structures (essentially those that we have used 
here, but not including the nonelementary compositions), elementary- 
arithmetical, in which arithmetic is used, e.g., in the Archimedean axiom 
for the calculus of line segments A,.9 Agso Vat ' x > y, (n a natural 
number), and further: elementary-logical and non-elementary, which are 
characterized by the fact that the relation symbols occur as variables 
(e.g., in the induction axiom for the Peano system). The last two types 
are distinguished by the linguistic-logical means employed, in a way 
which cannot be described here for lack of space. 

Can it happen that a system of axioms uniquely characterizes a con- 
figuration up to isomorphism, such a structure being called monomorphic? 
The answer here is affirmative, as is shown by the example of a vector 
space of given dimension over the field of rational numbers (which is an 
elementary-arithmetical structure); but the most important algebraic 
structures, such as group, ring, field, and lattice, are not monomorphic. 

Here we have tried to give some indication of possible questions in a 
theory of structure. What we have said is perhaps enough to show that 
we are dealing here with new points of view, from which an attempt is 
made to survey the whole of mathematics. 


CHAPTER 11 


Zorn’s Lemma and the High Chain Principle 


The present chapter deals with two maximal principles in the theory of 
sets: Zorn’s lemma, which has been used very frequently in recent times, since 
it simplifies many former proofs; and the high chain principle (cf. §4), which, 
although trivially equivalent to Zorn’s lemma, has the advantage of being 
intuitively plausible. The key position of the high chain principle in this type 
of argument appears to have remained unnoticed up to now.’ 


1. Ordered Sets 


We first give some definitions and a few of their immediate consequences 
(cf. also IB9, §1; IA, §8.3). 

By an ordered set, or an order, we mean a set M = {a, b, c, ...} together 
with a two-place relation < defined on it, with the following properties: 


Reflexivity: a <a for all a. 
Identivity: a<bandb <aimply a= b. 
Transitivity: a<bandb <cimplyas<ce. 


An ordered set is said to be totally (or linearly) ordered, or to be a chain, 
if it has the following additional property: 
Comparability, or connexity: for any two elements a, b we have 


a<xb or b<a. 


The terminology varies in the literature. It is also common to refer to our 
ordered set as a “partially ordered set,”’ and to restrict the term ‘‘ordered set’’ 


1 For other set-theoretic maximal principles and their equivalence to the axiom of 
choice see §4, exercise 12, and, for example [12], [1], [6], in the bibliography at the end 
of the chapter. 
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to chains. Moreover, a distinction is often made between an ordered set and 
an order, the latter term being used to refer only to the relation defined on an 
ordered set. 


Examples of orders are given by the “‘set-orders’’: if S is an arbitrary 
system of subsets of a set A, then S becomes an ordered set under the 
relation of inclusion (C). This class of examples already includes, up to 
isomorphism, all possible orders: every order M is isomorphic to a 
set-order. For if ae M and we let a be the set of elements x € M with 
x <a, then the mapping a — a is an isomorphism of M onto a set-order. 

In an ordered set we write a < b to mean “a < 5 with a= 5b,” and 
a>bora> btomeanb <aorbd < a, respectively. 

Let T be a subset of an ordered set M. Then 7 itself is an ordered set 
under the same relation <. In particular, a subset of a chain is also a 
chain. The statement “x < a (or x <a) for all x ET” is abbreviated 
to T < a(or T < a). We now make the following definitions: 


To say that s is an upper bound of T means that T < s. 

If here s ¢ T (i.e., T <.s), then s is a proper upper bound of T. 

To say that g is a greatest element of T means that ge T and T <g 
(i.e., g is an upper bound of T contained in T). Thus an upper bound of 
T is either a proper upper bound of T or a greatest element of T. 

To say that m is a maximal element of T means that m & T and that there 
exists nox ET with x > m. 


A subset 7 need not necessarily have an upper bound or a greatest 
element or a maximal element. But obviously 7 can have at most one 
greatest element, though it may have several maximal elements, and 
also, of course, several upper bounds. A greatest element is always a 
maximal element, but the converse is in general false, although the two 
concepts coincide if T is a chain. 

The concepts lower bound, least element, minimal element of T are 
defined dually (i.e., with > in place of <). 


Exercises 


1. Every finite ordered set has at least one maximal element. 


2. (a) If M is an ordered set in which every two elements have an upper 
bound, then every maximal element of M is also a greatest element 
of M. 
(b) If M is finite, the converse of (a) also holds (cf. ex. 1). 


3. If a < b and there exists no y with a < y < 5, then ais called a Jower 
neighbor of b, and b is an upper neighbor of a. Does there exist a chain 
in which every element has an upper neighbor but infinitely many 
elements have no lower neighbor ? 
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4. Prove that in every infinite chain in which every nonempty subset has 
a smallest element there exists ‘an ascending sequence, i.e. a sequence 
Q,,4,,4,°°' With a, <a, <a,°". 

5. An ordered set in which every nonempty subset has a least and a 
greatest element must be a finite chain (and conversely). 


6. In every ordered set M the following statements are equivalent: 

I. Every nonempty subset of M has at least one maximal element. 
II. Every nonempty chain has a greatest element. 

III. (‘Ascending chain condition.’’) There exists no ascending 
sequence, i.e. for every a, < dg <a, <'*: there exists an n with 
An = Anu = Ania = '": 

Ill’. Every “‘finite-below” chain is also “‘finite-above;” i.e., if for every 
element a in a given chain there are only finitely many elements 
below a, then for every element 5 in the chain there are only 
finitely many elements above 5b. 

III”. Every finite-below chain is finite. 

7. Construct (e.g. by drawing their ‘“‘order diagrams”’ as in the chapter on 
lattices) all the ordered sets with fewer than five elements. (There are 
exactly 25 such sets, apart from isomorphism; five with 3 elements, 
and sixteen with 4.) 

8. Let a set M = {a, b,c, ...} be said to be ordered if there is given on M 
a two-place relation < with the two properties: 

Irreflexivity: a <a for all a. 
Transitivity: a< band b <cimplya<ce. 

Prove that this definition of order is equivalent to the one given above, 
in the following sense: if a given relation < is reflexive, identive and 
transitive, then the relation <, defined bya <bifa<banda=b, 
is irreflexive and transitive, and conversely, if a given relation < is 
irreflexive and transitive, then the relation <, defined by a <6 if 
a< bora=), is reflexive, identive and transitive. 


2. Zorn’s Lemma 


After these preliminary remarks we now formulate Zorn’s lemma.? 


Z. An ordered set in which every chain has an upper bound contains a 
maximal element. 

The role of Zorn’s lemma may be described as follows: in arguments 

involving infinite sets, the older proofs often made use of the well-ordering 


? The name Kuratowski’s lemma would be more correct (cf. [8] 1922, statement (42), 
[21] 1935. 
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theorem and transfinite induction (for these concepts see IA, §7.4, appendix 
to IBI, §§2, 3,5 and for example, [6], [12], [16]). In general, the well- 
ordering used in the proof has nothing to do with the underlying structure 
of the set or with the theorem to be proved; the well-ordering theorem 
merely provides a proof that the set in question admits at least one well- 
ordering, the particular nature of which is unknown and irrelevant, 
and this well-ordering is made the basis of a transfinite induction. But 
in spite of its correctness such a procedure is usually felt to be unsatis- 
factory. In many cases Zorn’s lemma allows us to avoid these unsatis- 
factory arguments and to replace them by a more natural method of proof; 
for the most part, the proofs become much clearer and shorter. 

Some examples of proofs by Zorn’s lemma will be given in the next 
section. The proof of the lemma itself is given in §4. 

In most applications Zorn’s lemma is used in the following special form, 
which refers to set-orders and makes a sharper assumption on the upper 
bound: 


Z’. Let S be a nonempty system® of subsets of a set A which with every 
nonempty chain contains its union. Then S contains a maximal element 
(i.e., a subset of A that is maximal in S). 


It is to be noted that in Z (and thus also in Z’) the assertion can be 
sharpened: 

Sharpened form of Z or of Z’: under the same assumptions as in Z or 
Z’', for every element there exists a maximal element over it. 

For let M be an ordered set satisfying the assumptions of Z, let ae M 
and let N be the subset of xe M with x >a. Then it is obvious that 
N is also an ordered set satisfying the assumptions of Z and the assertion 
follows by the application of Z to N. 


3. Examples of the Application of Zorn’s Lemma 


We shall now prove three algebraic theorems by means of Zorn’s lemma. 
Further examples of proofs based on Zorn’s lemma are easy to find in 
the recent literature on topics in algebra or topology. 


Theorem 1. In a commutative ring R with unit element every ideal 
distinct from R is contained in a maximal ideal.‘ 

An ideal M in R is said to be maximal if M ~ R and there is no other 
ideal between M and R (in other words, if M is a maximal element in 
the set-order of the ideals  R). 


* The word “system” will be used as a synonym of “set.” 
4 For the definitions of “tring” and ‘‘ideal of a ring’’ see IBS, §1.2, §3.1, 
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Proof of theorem 1. Let J, be an ideal in the given ring R with 
Jy ~ R. Let S be the set of ideals J with JOD J,, JA R. 

We show that S satisfies the assumptions of Z’ (with A = R). The 
assertion then follows by application of Z’ to S. 

Since J, € S we see that S is nonempty. Let K be a nonempty chain in S 
and let V be the union of K (i.e., the set-theoretic union of all the ideals 
in K),. Then we must show that V €S; that is to say, 


a) Vis an ideal in R, 
b) VII, 
c) VAR. 


As for a): arguments similar to the proof about to be given for a) occur 
everywhere in the applications of Zorn’s lemma; we give such an argument 
in detail here once for all: if a, b € V and if re R, then there exist J, J’ e K 
with ae J, be J’, and since K is a chain, we have JC J’ or J’ C J. Without 
loss of generality we may assume J’C J. Then a,be€J, and therefore 
a — b, rae J (since J is an ideal) and thus also € V, so that V is an ideal. 

As for b): since K is nonempty, there exists a Je K, and for this J we 
have J, © JC V, from which it follows that V2 J, . 

As for c): for the ideals J of a ring R with unit element it is clear that 
J = Rif and only if 1 e J. 

From V = R it would follow that 1 € V, so that there would exist a 
Je K with 1 € J and then for this J we would have JES and J = R, 
in contradiction to the definition of S. 

Remark on theorem 1. For a not necessarily commutative ring R 
with unit element it is obvious that the corresponding statements for 
left ideals, right ideals, and two-sided ideals can be proved in exactly 
the same way. 


Theorem 2. Every vector space has a basis. 


More precisely, we show that every (not necessarily finite-dimensional) 
vector space V over a skew field K has a basis.° 

A (not necessarily finite) subset 7 of V (more precisely, an indexed 
subset) is said to be linearly independent if each of its finite subsets is linearly 
independent (in the usual sense). The set 7 is called a generating system 
for V if T is not contained in any proper subspace of V. By a basis of 
V we mean a linearly independent generating system of V. 


Proof of Theorem 2. We may assume that V does not consist of the 
zero vector alone (otherwise the empty set is a basis of V). Let S be the 


5 For the definitions of ‘“‘skew field,” “vector space,” “subspace (vector subspace)” 
and “‘linearly independent (for finite sets of vectors)’’ see IB3, §1.1—1.4. 


11. Zorn’s Lemma and the High Chain Principle 527 


aggregate of all linearly independent subsets of V. Then S obviously 
satisfies the assumptions of Z’, with A = V (cf. the remarks under a) 
in the proof of theorem 1), so that by Z’ there exists a maximal linearly 
independent subset of V®. Thus it only remains to show: 

Every maximal linearly independent subset B of V is a generating 
system of V (and thus also a basis of V). 

If we assume that there exists a subspace T of V with T+ V and 
BCT, then 7+ V would mean that there exists a vector ne V with 
neTZ. Let B’ = BU {n}. Then it is easy to see that B’ would also be 
linearly independent, so that B would not be a maximal linearly inde- 
pendent subset. 


Theorem 3 (Theorem of Artin-Schreier). Every formally real field can 
be ordered (is orderable)." 


By a domain of positivity of a field K we mean a subset P of K with 
the following properties (here — P denotes the set of all —x with x € P): 


1) a,bePimply a+ 5b, abe P, 
2) O€P, 
3) —PUM}UP=K. 


Not every field has a domain of positivity; for example, a field of 
characteristic + 0 cannot have one; on the other hand, there exist fields 
that have several. 

A field K with at least one domain of positivity is said to be orderable. 
If one of the domains of positivity in an orderable field is distinguished, 
we speak of an ordered field. More precisely: an ordered field is a pair K, P 
consisting of a field K and a domain of positivity P in K. 

In an ordered field K, P a relation a < b is defined by 6 —aeP. 
For a = 0 it follows from this definition of < that P is the set of elements 
> 0, a fact which explains the name “domain of positivity” for P. 

If in an ordered field K, P we set R = Pu {0}, then a,beR imply 
a+b, abe R, and we have —RQ R = {0} and —RUR = K; and if 
the relation a <b (ie, a< b or a= 5) is defined by }—aeR, 
then < is a total order on K which is compatible with addition and 
multiplication in K (i... a <6 implies a+ c < b+ for all c and 
implies ac < be for all c with 0 < c. These facts enable us to provide 
equivalent definitions of an ordered field, in the following way: 

A field is said to be formally real if —1 cannot be represented as the 


6 From the sharpened form of Z’ we see that every linearly independent subset of V 
can be extended to a maximal linearly independent subset of V. 
? For theorem 3 see also IBI, §2.5, §3.4 and IB8, §2.2. 
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sum of squares; or equivalently, if a,2-+ a2 + --- + 4,2 = 0 implies 
a4 =a4=°*=a,=0. 

We note that the converse of theorem 3 is trivial; for in an ordered 
field every square, and consequently every sum of squares, is > 0, but 
—1 is < 0. Theorem 3 thus gives an ‘“‘algebraic” characterization (i.e., 
a characterization in terms of the operations + and - alone) of the 
orderable fields: a field is orderable if and only if it is formally real. 

Proof of theorem 3. Let K be a formally real field. Let Q be the set 
of all nonzero sums of squares of elements in K. Let S be the set of all 
those subsets 7 of K that contain Q and have the properties 1), 2). 

It is obviously enough to show: 


a) S satisfies the assumptions of Z’, with A = K. 
b) Every maximal element of S has the property 3). 


As for a): since K is formally real, Q is exactly the set of all 5%., a? 
with a,€ K, a; 40, n > 1. Thus QES, so that S is not empty. The fact 
that with every nonempty chain the set S also contains its union is proved 
in the same way as under a) in the proof of theorem 1. 

As for b): for TeES let Ty = {0} U T. Then b) is a consequence of the 
following lemma.. 


Lemma. If TeS and r¢é —TU{0} UT, then 7’ = T+TrT, (the set 
of all a + rby with ae 7, by € T,) is an element of S properly 
containing 7. (Thus T is not maximal). 


Proof of the lemma. Obviously TC 7’ (we may choose by = 0), and 
thus Q C T implies Q C7’. 

Since le QC 7 and —re T (for otherwise re —7) we have —r + 1, 
so that r+ 1 +0. Thus 


i al r— 1\? 
ae (= i) +r (= i)» 
so that Q C T implies re 7’. Since r¢ T, it follows that T’ ~ T, so that 
TCT". Thus we need only verify the properties 1), 2) for 7’. 


1) For any two elements a + rb), c + rdy€T"’ it follows from QC J, 
and from the additive and multiplicative closedness of T and T, that 


(a + rby) + (c + rd) = (a +c) + r(bp +h) ET" 
and 
(a + rb,)(e + rdy) = (ac + r®bod,) + r(ady + boc) € T’. 


2) OE T’ would imply 0 = a+ rb, with ace T, bb€T%; a= —rbh; 
by 4 0 (since a 4 0), so that by = b with b € T; and thus —r = a/b = (1/b)® 
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ab with a,be T; —reT (since QC T and T is multiplicatively closed); 
consequently r € —7, in contradiction to the assumption of the lemma. 


4. Proof of Zorn’s Lemma from the Axiom of Choice 


In this section we introduce the high chain principle mentioned in the 
introduction. Zorn’s lemma at once turns out to be nothing but a more 
complicated form of the high chain principle. In the rest of this section, a 
simple proof of the high chain principle (and thus of Zorn’s lemma) from 
the axiom of choice is given in full detail. 


Let us first give the definition and some trivial properties of the operation 
“roof” (denoted by *), on which this section will depend. 

Let M be an ordered set. Here and below the word “chain’’ will 
always refer to a subchain of M. The elements of M will be denoted by 
a, b,c, ..., X, y, the subsets of M by A, B, the chains by C, K, L and the 
empty set by 9. 

For every subset A of M let A be the set of all elements x with A < x. 
Thus A is the set of all proper upper bounds of A. 


Obviously A A A = 9, and AC Bimplies BC A. 


(1) Jf A, Bare arbitrary subsets of M, at least one of the two sets ANB, ANB 
is empty. 

For otherwise there would exist elements a, b, with ac A, B<aand be B, 
A <b, which would imply b < a and a < 5, in contradiction to the identivity. 
(2) If A,B are subsets of M with AC BUB and BO AWA, then ACB 
or BCA. 

For by (1) we have A 7 B = §, so that AC B, or 4 B = G, so that BC A. 
(3) If K is a chain and CC K, then: C = R is equivalent to CO K = 9. 


Proof. C =K implies CA K =KO K = 9. Conversely, from CA K = 9 
we have, in succession: for every x € K it is untrue that C < x; for every xe K 
there exists ac € Cwithc < x untrue, ie., with x < c (since c, x are comparable, 
being elements of the same chain K); CCK; C=K (for CC K always 
implies C 2 K). 


We say that a chain K is high, and we call it a high chain (with respect 
to M) if K is empty. Thus a high chain is a chain that has no proper 
upper bound, i.e., a chain with no element properly over it, a chain that 
cannot be continued upward. 


Note that a high chain need not be a maximal chain, i.e. maximal in the 
set-order consisting of the chains of M (although, of course, every maximal 
chain is a high chain). For example, if m is a maximal element of M, the chain 
constisting of m alone is a high chain, but in general it will not, of course, be a 
maximal chain. 
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This difference between the concept of high chain and maximal chain marks 
the difference between the high chain principle and the Hausdorff-Birkhoff 
maximal chain principle. 


Let us now formulate the 
High Chain Principle. Every Ordered Set Contains a High Chain? 


This maximal principle makes no hypothesis about the given ordered set, 
and it has an intuitive acceptability which is independent of any proof — 
both in contrast to Zorn’s lemma. Nevertheless, it is in fact identical with 
Zorn’s lemma, as we shall now see. 

There are two kinds of high chains: high chains without upper bound, 
and high chains with upper bound. The high chains without upper bound 
are precisely the chains without upper bound. The upper’ bounds of high 
chains are precisely the greatest elements of high chains, and so precisely 
the maximal elements of M. Thus the high chains with upper bound are 
precisely the chains that contain a maximal element (of M). These remarks 
show at once that: 


Zorn’s Lemma and the High Chain Principle are Equivalent. 


For in an arbitrary ordered set M the following statements are equivalent 
(the first one being Zorn’s lemma and the last one the high chain 
principle): 


If every chain has an upper bound, there exists a maximal element. 
There exists a chain without upper bound or there exists a maximal 
element. 

There exists a high chain without upper bound, or there exists a high 
chain with upper bound. 

There exists a high chain. 


The proof of the high chain principle from the axiom of choice, which 
we now give, is the last step in a gradual development beginning with 
Zermelo’s first proof of the well-ordering theorem ({19], 1904). For 
example, Kneser’s proof of Zorn’s lemma ({7], 1950), and Weston’s 
outline of a proof ({17], 1957), which forms the basis of the proof to be 
given here, are steps in this development toward simplicity. 

The proof makes use of the so-called Axiom of Choice (cf. IA, §7.6, 
supplement to IB], §5): 


8 Of course, the high chain principle can be sharpened to the statement that in an 
ordered set every chain K is an initial segment of a high chain (it is only necessary to 
apply the high chain principle to the ordered set A). 
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Axiom of Choice. For every system S of nonempty sets there exists 
a choice function, i.e., a function f which to every set NES assigns an 
element of N: thus f(N) &€ N. 

Proof of the high chain principle. Let M be an ordered set. By the 
axiom of choice there exists a choice function defined on the system of 
all nonempty sets C (where C is a chain). Let f be such a function. Then 
C + @ implies f(C) € C. 

The proof depends on the concept of an f-chain. A chain K is called 
an f-chain if it has the following property: 


(*) CC KandC AK <Q imply that f(C) is the least element of C7 K, 
i.e., that f(C)eC A K and f(C) < ENK. 


In other words: If C is a subchain of K with proper upper bound in K, 
then f(C) is the least of these proper upper bounds.® 
In view of (3) and f(C) € C the property (*) is equivalent to 


(**) CCK and C + K imply that f(C)€ K and f(C) < CK. 


The proof consists in deriving two rules for the creation of {chains 
((i), (ii)) and applying them to the set-theoretic union of all f-chains. 


(i) Continuation of f-chains: if K is an f-chain with K ~®, then 
K* = KU f(R) is an f-chain (and, of course, K* £ K). 


Proof. K </(K) implies that K* is a chain with greatest element 
f(K), and K* € K. 

Assume that CC K* and that CM K* is nonempty. Let se Cn K*. 
Then it follows successively that C < s < f(K); f(K)¢C; CCK. 

If now C = K, then f(C) = f(K) and CN K* = Ka kK* = f(R). 

On the other hand, if C = K, it follows from (**) that f(C) € K (so that 
f(C) < K and therefore f(C) < f(K)) and f(C) < CK. 

Thus in every case f(C) ¢ K* and f(C) <C 2 K*. 

The crucial point in the proof of (ii) is the following lemma. 


Lemma. If K, L are f-chains, then LC KU K (and, of course, also 
KCLUf). 


Proof of lemma. For L C K there is nothing to prove. Consequently, 
assume L ¢ K and let y be an arbitrary element with ye L, y¢ K. The 
assertion is that y € K. 


® The function f provides us with a “rule for climbing” that not only allows us to 
climb from a given element to a greater element but also to surmount, with one jump, 
a whole infinite chain; and the f-chains are the “upward paths’’ created in this way. 
Taking C # we see, in particular, from (*) that every nonempty f-chain begins with 


FQ). 
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Let C be the set of all x with xe LA Kand x < y.!° Then C < y and 
(in view of y¢ K) y¢C, so that C < y, ie, ye. 

Since CC L, ye COL we have from (*): f(C) € L and f(C) < y. 

Since CC K, the hypothesis C 4 K would (by (**)) imply f(C) ¢ K 
and therefore (by the definition of C) f(C) € C. So C = K and thus ye K 
(since y EC). 

From this lemma and (2) we get the comparability of f-chains: if K, L 
are f-chains, then KC Lor LC K. 

(ii) Union of f-chains: the union F of an arbitrary set of f-chains is also 
an f-chain. 

Proof. The comparability of f-chains shows that F is a chain. From 
the lemma it follows further that FC K U K for every f-chain K. 

Now let C C F and C1 F be nonempty. Let x be an arbitrary element 
of CF. Then, since x ¢F, there exists an f-chain K with xe KCF, 
and it follows that xeC OK, so that CN K+ &. 

Since CC F, and FC KU K, we have CC KURK. Since CON K 4 Q, 
it follows from (1) that cA K = 9, so that CC K. 

Since K is an f-chain, it follows from CC K, CA K €Q@ that f(C) e K 
and f(C) < CN K, so that f(C) e F and f(C) < x. 

Now let V be the union of all f-chains. By (ii) we see that V is an f-chain 
and consequently by (i) that 7 = 9, i.e., Vis a high chain. For if we had 
V = @, then by (i) there would exist an f-chain V* with V* CV, in 
contradiction to the definition of V. 


Remark. Let us denote the axiom of choice by A and the high chain 
principle by H. A trivial application of Z’ gives A, so that we have proved 
the implications A— H ~ Z— Z’ — A. Thus, the set-theoretic maximal! 
principles A, H, Z, Z’ are equivalent. 


Exercises 


We first give 3 definitions. 

(i) A subset A of an ordered set M is called an initial segment of M if 
for every x <a that xA. 

(ii) For every subset S of an ordered set M the corresponding set S of 
lower elements is defined as the set of elements xM for which there 
exists an element sS with x < s. 

(iii) An ordered set M is said to be well-ordered if every subset of M has 
a least element. 

1. The relation of being an initial segment is transitive; i.e. every initial 
segment of an initial segment of an initial segment of an ordered set M 
is an initial segment of M. 


101n fact, C = K. 


11 


2. 


3. 
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The intersection, and also the union, of an arbitrary number of initial 
segments of an ordered set M is an initial segment of M. 


Prove that for every subset S of an ordered set M: 

(a) S is the initial segment of M generated by S; namely, S is an 
initial segment of M that contains S, and 5S is the intersection of 
all initial segments of M that contain S. 

(b) From (a) it follows that S is an initial segment of M if and only 
ifS = S. 

(c) S is an initial segment A of M with SC A and S = A. (Thus, 
in the definition of an f-chain we could take C to be an initial 
segment.) 


(d) SAS =¢. 


. In every subset S of a totally ordered set K we have SU S = K, and 


therefore the following three statements are pairwise equivalent 
(cf. 3(b), (d)): S is an initial segment of K; S = S; SUS = K. 


. A subset LZ of a chain K of an ordered set M is an initial segment of K 


if and only if KCLUL, 


. (a) Every well-ordered set is totally ordered. 


(6) For finite sets the converse also holds. 


In exs. 7 to 11 below, the assumptions are the same as in the proof of the 
high chain principle; i.e., M is an ordered set and f is a choice function 
on the system of all nonempty sets C (where C is a subchain of M). Then 
M has the following properties (7-11): 


7. 
8. 
9. 
10. 


II. 


12; 


The intersection of arbitrarily many /-chains is an f-chain. 
The set of all f-chains is well-ordered with respect to inclusion. 
Every f-chain is well-ordered. 


A subset Z of an f-chain K is an f-chain if and only if it is an initial 
segment of K (use ex. 5 and the lemma of §4). 


The f-chains are precisely the initial segments of the union V of all 
J-chains. 


Consider the following statements: 

(a) axiom of choice A 

(6) high chain principle H (cf. §2) 

(c) Zorn lemma Z 

(d) special case Z’ of the Zorn lemma (cf. §2) 

(e) well-ordering theorem W: “‘every set can be well-ordered”’ 

(/) Hausdorff-Birkhoff maximal chain principle M: “‘in every ordered 
set there exists a maximal (with respect to inclusion) chain”’ 
i.e. a chain which ceases to be a chain if any further element of 
the ordered set is adjoined to it. 
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5. 


Prove the following implications: 


A>H->-Z>W-A and H->Z—>Z’—>M-—H. 
In other words, the statements A, H, Z, Z’, W, M are pairwise equivalent. 
Hints for the proofs. 
For AH and H—> Ze. §. 
The implications Z— Z’ and M—H are specializations W— A: 
the union V of the given system S of nonempty sets can, by W, 
be well-ordered; for a fixed well-ordering of V choose the smallest 
element from each set of S. Z’ — M: the entire aggregate of chains of 
an ordered set M forms a system S satisfying the assumptions of Z’ 
(with A = M). Z— W: for an arbitrary set M let 2 be the set of all 
well-ordering relations defined on subsets of M. For w,, w, € 2 let 
W, < we be defined as follows: the domain of definition T, of w, is 
contained in the domain of definition T, of w, , on T, the two relations 
w, and w, coincide, and 7, is an initial segment of 7, with respect 
to w,. With this relation < the set 2 is an ordered set in which every 
subchain has an upper bound. Then Z states that $2 has a maximal 
element. But every maximal element w of {2 must be defined on the 
whole of M, since an element of M not contained in the domain of 
definition of w could be “adjoined to w from above,” thereby 
producing an w’ > w. 


. In the proofs of §3 it is possible, of course, to use the high chain 
principle instead of the Zorn lemma. For the proof of Theorem 2, 
for example, one may first prove (without the Zorn lemma and thus 
independently of the axiom of choice): the union of a high chain in 
the ordering of all linearly independent subsets of vector space V 
is a basis of V. Then Theorem 2 follows immediately from this theorem 
and the high chain principle. What is the corresponding ‘“‘quintessence”’ 
(i.e., formulation independent of the axiom of choice) of Theorem | 
(of Theorem 3)? 


Questions Concerning the Foundations of Mathematics 


In the present chapter we have up to now taken the so-called “naive 


point of view’ concerning sets. (cf. IA, §1.4, §7.1, §7.2). But everything 
we have said here, and in particular the proof we have given for the high 
chain principle, could also be formulated in the usual axiomatic set 
theories (cf. IA, §7.1 and §7.6). In this sense the proof we have given 


in 


§4 for Z is correct and can be verified even by an intuitionist or a 


constructivist. 


But a constructivist would regard an axiomatic (formalistic) inter- 
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pretation of the concept of “‘set’’ as meaningless and therefore without 
interest;13 he would admit only constructive interpretations, and from 
this point of view (cf. IA, §1.4, §1.5) he would find two mistakes, or at 
least gaps, in the proof given above in §4: 


1) In one place in the proof we made use of the axiom of choice without 
actually constructing a choice function (on this question see, for example, 
[4], Chapter II, §4). 

2) The set V was defined as the union of all f-chains, but it turned 
out later that V is itself an f-chain. Thus we have defined an object (the 
set V) by means of a concept (f-chain) under which the object itself is 
included. 


More precisely, from the constructive point of view the situation is 
somewhat as follows: the f-chains are (in general, infinite) subsets. 
The only possibility of constructing an infinite subset is to construct a 
“representing property’ for it (namely, a propositional form in a suitable 
language). Every construction of representing properties for sets must be 
carried out by means of certain linguistic tools, which must either be given 
or constructed in advance. With more linguistic tools at our disposal 
we can construct more properties and thus represent more sets. But the 
totality of all linguistic tools can never have been constructed (for if we 
were to assume that this is the case, we could proceed to use these linguistic 
tools in order to create further ones), and thus, in a constructive inter- 
pretation, the expression “all f-chains’’ can never have an absolute 
meaning but only a relative one; it can only be understood in the sense 
of all f-chains “representable in a given language S.”’ If we now form the 
union V of this relative totality of chains, we do not know whether a 
representing property of V can be found in the language S, i.e., whether V 
itself belongs to this totality. But precisely this fact was used in the above 
proof, namely when we said: “if 7 49, then V* = VU {f(P)} is an 
jf-chain, and thus V* C V.”” For in order to draw the conclusion that 
v* CV, we must know that V* is an f-chain representable in S. Since 
it is obvious that V* is representable in S if and only if V is representable 
in S (the two sets differ only by a single element), we see that a constructive 
interpretation of our proof in a language S is correct only if the union V 
of all f-chains that are representable in S is itself representable in S. 

“YImpredicative definitions,” like this definition of V, occur in many 
places in mathematics in its usual form, e.g., in the introduction of the 
real numbers (cf. IBI, §4.3 and [18]). An objection of the type 2) above 
was already raised by Poincaré against Zermelo’s first proof of the well- 
ordering theorem (cf. [11], [19], and Russell’s ‘‘vicious circle principle” in 


18 Even if it were proved that the underlying formalized set theory is free of contra- ° 
dictions. 
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the introduction to [13]; see also [4] and the literature given there). 
A more precise examination of the whole question in the framework of 
P. Lorenzen’s operational mathematics is given in [10] (cf. IA, §10.6 and 


[9]). 

In operational mathematics every set is countable in a suitable language 
level. So let us note here that, for a countable ordered set, a constructive 
proof of the high chain principle can easily be given by complete induction. 
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Complexes of a group, 182 

Componentwise multiplication, 392 

Composition, 406; configuration, 511; 
factors, 216; of fields, 436; of groups, 
436; inner, 510; nonelementary, 510 

Composition series, 216, 354; of fields, 
438; of groups, 438 

Comprehension, axiom of, 58 

Computable function, 36 

Concepts, fundamental, 22, 26 

Condition(s): ascending chain, 524; 
basis, 339; divisor-chain, 360; factor 
chain, 495; maximal, 360; normality, 
265; orthogonality, 265 

Configuration(s), 251, 508, 509; 
automorphism group of the, 512; 
composition-, 511 

Congruence(s); algebraic, 396; pure, 
399; relation, 65, 514; modulo an 
ideal, 341; a subgroup, 380 

Congruent, 270 

Conjecture: Fermat, 11-12, 37, 398; 
Goldbach, 10, 76, 405 

Conjugate(s), 183; algebraic elements, 431; 
Cayley number, 481; complex, 461; 
fields, 437; quaternion, 475; system of, 
402n36; transposed matrix, 270 

Conjunction, 12 

Connectives: lattice-theoretic, 487; 
logical, 487; set-theoretic, 487 

Connex, 63 

Connexity, 522 

Consequence, 8, 20, 24, 46 

Consistency, 6, 109; relative, 31; 
semantic, 31; syntactic, 31 

Consistent algorithm, 39 
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Constant function, 291 

Constants: propositional, 12; 
structure, 402-403n37 

Constructibility of a regular polygon, 454 

Constructive (point of view), 7 

Constructivist, 534-535; school, 7-8, 40 

Continuous function, 462 

Continuum, 55; hypothesis, 60; 
hypothesis, special, 60 

Contradiction, 23, 80 

Contragredient, 264 

Contraposition, 47 

Contravariant vector, 264 

Convention, Einstein summation, 239 

Conventionalism, 6 

Convergence criterion (Cauchy), 467n8 

Convergent of a continued fraction, 373 

Converse relation, 62 

Coordinates, 240 

Coprime, 332 

Coset: left, 186; right, 186 

Countable, 55, 151; at most, 55 

Countably infinite, 151n90 

Covariant: tensor, 264; vectors, 264 

Criterion(a): convergence (Cauchy), 
467n8; for divisibility, 385; 
irreducibility (Eisenstein), 347; for 
multiplicity of zeros, 426; for a 
quadratic residue function (Euler), 
400; for separability, 424; for 
subgroups, 184 

Crystallography, 204 

Cube, duplication doubling of, 417 

Curry, 40 

Cut, Dedekind, 50, 133, 135 

Cycle, 224 

Cyclic, 191; groups, 191; groups, 
fundamental theorem for, 192 

Cyclotomic: field, 405; polynomial, 
428, 430 


DE Moivre, formula of, 458 

Decidable, 36 

Decimal, infinite, 130 

Decomposition: canonical, 225; into 
partial fractions, 368 

DEDEKIND, RICHARD, 72, 403, 448, 483, 
488; chain theorem of, 504; cut, 50, 
133, 135; definition of infinity, 54 

Deduction, 41 

Deficient number, 371 

Definite, positive, 273 

Definition, 20-21 

Degree, 299; of an algebraic extension, 
420; of a representation, 220 

Denominator, 122; lowest common, 360 

Denotation (bedeutung), 10 
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Density, 407 

Dependent, linearly, 240 

Derivation, 41; algorithmic, 53 

Derivative, 303; of a polynomial, 423 

DESARGUES, little theorem of, 482 

Description operator, 12, 18 

Determinant(s), 235, 279; expansion 
of a, 282; multiplication of, 280; 
Sylvester, 349 

Diagonal: form of a matrix, 260; 
procedure (Cantor’s), 152; procedure 
(Cantor’s), first and second, 55; 
sequence, 152 

Diagram(s), 186; of Hesse, 485; order, 485 

Dickson, L. E., 448 

Difference: left, 161; right, 161 

Division algebra(s), 402-403n37, 477; 
associative, 478n17; of finite rank, 478 

Digital sum, 385; alternating, 385; 
generalized, 385 

Dilatation(s): of the plane, 456; zero, 456 

Dilative rotations, 457; Hermitian, 469 

Dimension, 240, 505 

Dimensional equation, 505 

Dirac, 6-function, 5 

Direct: product, 198, 493; product of 
groups, 394; sum of (the ideals), 392 

Directed, 64; set, 64 

DIRICHLET, 406; pigeon-hole principle, 
102, 463 

Discriminant, 352, 402 

Disjoint, 52 

Disjunction, 13 

Distributions, 5 

Distributive: lattice, 68, 489; laws, 53, 
99, 115; laws, infinite, 490 

Divisibility: criteria for, 385; fundamental 
lemma of the theory of, 359 

Division: two-sided, 180 

Divisor(s): common, 358; 
complementary, 356; greatest 
common, 332, 358; prime, 358; 
proper, 357; trivial, 357; of zero, 119, 
293, 322; of zero, nilpotent, 342 

Divisor-chain, 330; condition, 360; 
proper, 330 

Domain(s), 62; first, 62; of a function, 
50; fundamental, 411; image, 247; of 
individuals, 21; integral, 119n47, 323; 
of integrity, 323; operator, 511; of 
positivity, 120, 464, 527; of scalars, 
235; second, 62; of transitivity, 221 

Dual, 487; axiom, 68; group, 483, 488, 
517; self-, 501; space, 263; vector 
space, 234 

Duality, principle of, 68 

Duplication doubling of the cube, 417 
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Dyadic fractions, 130n60 


Echelon matrix, 259 

Eigenvalues, 286 

Eigenvectors, 286 

EINSTEIN, Summation convention of, 239 

EISENSTEIN, irreducibility criterion of, 347 

**Either-or,” 13 

Element(s), 50; axiom for sets with one, 
60; conjugate algebraic, 431; exponent 
of a group, 193; G, order of the, 193; 
greatest, 490, 523; greatest common 
lower, 486; identity, 167; 
“imaginary,” 448; inverse, 111; least, 
490, 523; least common upper, 486; 
maximal, 164, 523; minimal, 523; 
neutral, 111, 167, 237; order of a 
group, 193; permutable, 168; prime, 
403; of a set, 51; superfluous, 497; 
unit, 66, 116, 167, 179, 321, 490; unity, 
321; zero, 179, 490 

Elementary: -arithmetical structure, 521; 
-logical structure, 521; ornament, 204; 
predicate logic, 73; symmetric 
functions, 302n23; symmetric 
polynomials, 307 

Elimination, 353; assumption-, 45; 
ideal, 354 

Empty: relation, 62; set, 52, 103; set, 
axiom for, 60; word, 231 

Endomorphism(s), 114, 128, 512; 
monotone, 146; multiplication of, 115, 
ring of, 116; sum of, 114 

Entire rational function, 292; in the 
sense of algebra, 301 ; in the sense of 
analysis, 301; of n arguments, 305 

Enumerable, recursively, 33, 35 

Enumerability, 35 

Equality, 108; of classes, 58; of value, 122 

Equation(s): characteristic, of a matrix, 
286; class, 218; dimensional, 505; 
Pell, 397 

Equivalence, 14; class, 65; of matrices, 
254; relations, 29, 65, 108; of sets, 
103; theorem (Bernstein), 54 

Equivalent, 54 

Euc ip, 28 

Euclidean: algorithm, 32, 332, 365; 
rings, 332, 361 

EuLer: criterion for a quadratic 
residue, 400; function, 382 

Even transpositions, 227 

Excluded middle, law of, 8 

Existence-introduction, 48n23 

Existential quantifier, 17, 18 

Expansion: of a determinant, 282; 
Laplace, 283 


INDEX 


Exponent, 445; of a group element, 
193; of a root of unity, 445 

Exponential function, 150 

Expression, relevant, 76n 

Extended: matrix, 259; predicate logic, 
23n, 42, 72 

Extension(s), 51; algebraic, 418; 
algebraic, degree of an, 420; Galois, 
420; field, 297n11, 413; finite, 418; 
normal, 420; problem, 413; ring, 297; 
of a set, 51; separable, 422 

Extensionality, principle of, 51, 58 


F-chain, 531 

Factor, 75; chain condition, 495; 
composition, 216; group, 196; 
proper, 328 

Factorization: canonical, 332; rings, 
theorem for, 343; rings, unique, 331 

False, 10, 23 

FERMAT: conjecture, 11-12, 37, 398; 
number, 372; theorem, 382 

Field(s), 124, 323; alternative, 481; 
characteristic of a, 324; 
composition of, 436; composition 
series of, 438; conjugate, 437; 
cyclotomic, 405; extension (or 
subfield), 297n11, 413; finite, 440; 
formally real, 464, 527; 
intersection of, 436; invariant, 402; 
skew, 235, 324, 526n; skew, of 
quaternions, 470; multiplicative group 
of a, 324, 357; ordered, 527; power 
series, 311; prime, 324; quotient, 125, 
325; radical over a, 452; of rational 
numbers, 124; real-closed, 464; of 
real numbers, 141; of relations, 63; 
of sets, 53; union of, 436 

Field splitting, 413; smallest, 413; 
uniqueness theorem for smallest, 415; 
in the wider sense, 413 

Figure, 172; group of a, 173 

Fin, symbol, 134 

Finis superior (or supremum), 134n 

Finished proof, 46 

Finitary, 40 

Finite, 55, 102; ascending length, 495; 
above and below chain, 524; 
descending length, 495; extension, 
418; field, 440; group, 168; length, 
495; rank, division algebra of, 478; 
set, Dedekind definition of, 103n21; 
system of generators, 185 

Finitely generated, 185 

First: axiom for unions, 60; Cantor 
diagonal procedure, 55; domain, 62 

Fix-group, 221 
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Flagged variables, 43 

Fonction polynome, 301 

“for all,” 12, 96 

Form(s): bilinear, 264, 268, 271; 
diagonal, of a matrix, 260; 
fundamental, 270; Hermitean, 271; 
Hesse normal, 30; linear, 234, 262, 
263, 345n48; multilinear, 264, 268; 
prenex normal, 518; propositional, 11, 
22, 94n2, 535; quadratic, 268; 
signature of a, 272 

Formalists, 6 

Formalization, 9 

Formally real (field), 464, 527 

Formula: of de Moivre, 458; 
inversion, of Mobius, 389; for 
rotations, Rodrigues’, 477 

Fraction(s), 122, 325; dyadic, 130n60; 
partial, 368; partial, decomposition 
into, 368; proper, 368 
continued, 333n29, 373; convergent of a, 
373; Hurwitz, 379; regular, 333n29, 373 

Free: group, 231; square-, 388; 
torsion-, 200; variables, 17; renaming 
of variables, 44n18 

FreceE, G., 10, 51, 72 

FROBENIUS, theorem of, 478 

Function, 64, 509; algebraic, 306; . 
choice, 535; computable, 36; constant, 
291; continuous, 462; Dirac 6-, 5; 
of a domain, 50; elementary 
symmetric, 307n23; Euler, 382; 
exponential, 150; identical, 291; 
inverse, 64; MGbius, 288; partition, 
406; product, 38; range of a, 50; 
recursive, 35, 38; signs, 12, 15; sum, 
38; summatory, 371, 388; unity, 388; 
unity, 388; zero of a, 293. See also 
Entire rational function 

Fundamental: concepts, 22, 26; domain, 
411; form, 270; lemma of the theory 
of divisibility, 359; sequences, 139; 
sequences of Cantor, 133; system of 
solutions, 257; tensor, 270; theorem 
for cyclic groups, 192 


G, element, order of a group, 193 

GALOIS, 447; extension, 420; group, 
409, 434; theory, 409 

Gauss: number, 372; plane, 457 

Gaussian integers, 316 

Gebilde, 508 

General polynomial, 453 

Generalization of the prime number 
theorem, 406 

Generalized: digital sum, 386; predicate 
variable, 72 
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Generated, 185; finitely, 185 

Generating system, 526 

Generators, 185; finite system of, 185 

GENTZEN, 43, 76; and QUINE, rules of 
inference, 43 

Gisss, JOSIAH WILLARD, 476n 

Glide reflections, 205 

GGDEL, 31, 38; completeness theorem, 
42; incompleteness theorem, 72; 
index, 36; numbers, 76 

Gédelization, 36 

GOLDBACH, conjecture of, 10, 76, 405 

Graphs, 186 

GRASSMAN, 275 

Greatest: common divisor, 332, 358; 
common lower element, 486; element, 
490, 523; lower bound, 68, 132, 486 

GRELLING, antinomy of, 85 

Ground set, 62 

Group(s), 167; Abelian, 111n33, 168; 
additive, of a ring, 318, 357; 
alternating, 227; automorphism, 
191; automorphism, of the 
configuration, 512; commutative, 
111n33, 168; commutator, 197, 
217; complexes of a, 182; 
composition of, 436; composition 
series of, 438; cyclic, 191; cyclic, 
fundamental theorem for, 192; direct 
product of, 394; dual, 483, 488, 517; 
factor, 196; of a figure, 173; fix-, 
221; free, 231; Galois, 409, 434; 
Hamiltonian, 194; Klein four-, 491; 
length of a, 216; of motions, 172; 
multiplication, 28; multiplication table 
for a, 188; multiplicative, of a field, 324, 
357; nilpotent, 220; with operators, 
518; order of a, 168; P-, 219, 220; 
planar rotation, 214; power of a, 192; 
quaternion, 194; simple, 196; 
structure problem for, 191; symmetric, 
171; theory, 28; topological, 232; 
torsion, 200; type problems for, 191; 
union of, 436 

Group element: exponent of a, 193; 
order of a, 193 


Half-turns, 205 

HAMEL, G., 146n88 

Hamiltonian groups, 194 

HANKEL, permanence principle of, 105n26 

HausporFF-BIRKHOFF, maximal chain 
principle, 530, 533 

Hemihedrism, 209 

HERBRAND, 38 

Hereditary, 93; meet-, 518 

HERMES, H., 484 
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Hermitian (HERMITE): bilinear form, 
271; dilative rotations, 469; form, 
271; matrix, 271; metric, 468; 
rotations, 468 

HERTZ, H., 6 

HEssE: diagram, 485; normal form, 30 

Heterologic, 85 

Heteronomous system of axioms, 27 

High, 529. See also Chain, high 

HILBERT, D., 6, 21, 406 

HOLDER (and JoRDAN), theorem of, 216, 
354, 504 

Holohedrism, 209 

Homogeneous, 305; system, 235 

Homologous, 511 

Homomorphism, 107, 212, 511; theorem, 
212; theorem, for rings, 342 

HORNER, rule of, 294, 295 

Hurwitz, continued fraction of, 379 

Hypercomplex system, 345 

Hypothesis: continuum, 60; 
continuum, special, 60; induction, 95 


Ideal(s), 338; basis of an, 339; classes, 
404; congruence modulo an, 341; 
direct sum of, 392; elimination, 354; 
left and right, 357; manifold of zeros 
of an, 354; maximal, 403; primary, 
342; prime, 342, 403; principal, 339; 
principal, ring, 339; theory, classical, 
343; two-sided, 357; unit, 338; zero, 
338 

Idealism, 3 

Idempotent, 69n, 392; ring, 69 

Identical: function, 291; permutation, 170 

Identification, 5 

Identitive, 63; law, 53 

Identity, 62, 321; element, 167; 
modular, 501 

“If and only if,” 14 

“Tf-then,” 14 

Image, 64, 212; domain, 247; pre-, 64; 
space, 247 

‘Imaginary’ element, 448 

Imaginary part, 458 

Implication, 14 

Impredicative, 85; definitions, 535 

Improper: real number, 136; subgroup, 
184 

Inclusion, 62 

Incompleteness: of arithmetic, 40; of 
extended predicate logic, 42; 
theorem of Gédel, 72 

Indecomposable, 199 

Independent: indeterminates, 304; 
linearly, 240, 526; transcendents, 304 

Indeterminates, 297; independent, 304 
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Index: Gédel, 36; kernel-, notation, 
246; of a subgroup, 186 

Indirect proof, 42-43 

Individual(s), 21; domain of, 21 

Induction: A-, 43; axiom of, 94; 
complete, 57, 94, 117; complete 
starting from k, 101; hypothesis, 95; 
mathematical, 94; modified principle 
of, 101; schema, 75; step, 95; 
transfinite, 57 

Inductive set, 164 

Inertia, Sylvester’s law of, 272 

Inference, 41; system of natural, 42 

Inference, rules of, 41; complete system 
of, 41; of Gentzen and Quine, 43 

Infimum (or greatest lower bound), 132 

Infinite: countably, 151n90; decimal, 
130; distributive laws, 490 

Infinitely distant points, 5 

Infinitesimal, 139 

Infinity: actual, 7; axiom of, 60; 
Dedekind definition of, 54; potential, 7 

Initial: case, 95; intervals, 156; segments, 
156, 532 

Inner: composition, 510; product, 234, 
266, 269 

Integer(s), 109, 368; addition of, 111; 
algebraic, 401; algebraic, integral 
domain of, 330; basis, 402; Gaussian, 
316; module of, 112, 120; negative, 
113; positive, 113; ring of, 120 

Integral domain(s), 119n47, 323; of 
algebraic integers, 330 

Integrally closed ring, 403 

Intermediate value theorem, 462 

Interpretation(s), 20, 22; isomorphic, 30 

Intersection, 52, 59, 62, 483, 486, 487; 
of fields, 436; of subgroups, 436 - 

Interval(s): closed, 499; initial, 156; 
nested, 133 

Into, 64, 509, 510 

Intramathematical, 5 

Introduction: assumption-, 43, 45; 
existence-, 48n23 

Intuitionist(s), 6, 534 

Intuitive theory of sets, 51 

Invariant, 183; field, 402; subgroups, 194 

Inverse, 111n34, 123, 167; element, 111; 
function, 64; left, 167n2; mapping, 
510; right, 167n2 

Inversion formula of Modbius, 389 

Invertible, 510; mapping, 64; mapping, 
one-to-one, 64; transformation, 249 

Irrational numbers, 152n 

Irrationality of 2, 47 

Irreducible, 328, 357 

Irreducibility criterion of Eisenstein, 347 
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Tsobaric, 350 

Isolated (ordinal number), 161 

Isomorphic: interpretations, 30; 
relations, 64 

Isomorphism, 117, 156, 190, 412, 512; 
order-preserving, 144; theorem, 214 


JACOBI, symbol, 400 

Join, the, 486, 491 

JORDAN-HOLDER, theorem of, 216, 354, 
504 


k-place predicate, 16 

KANT, IMMANUEL, 4 

Kernel, 212; -index notation, 246 

KLEIN-BARMEN, FRITZ, 483; (Klein) 
four-group, 491 

KneserR, H., 530 

KRONECKER, symbols of, 245 

Kummer, 403 

KURATOWSKI, lemma of, 524 


LAGRANGE, JOsEPH Louls, 186, 406; 
(Lagrange) relation, 398 

A-introduction, 80 

Language(s): layer, second, 136n74; 
natural, 4, 9, 85 

LAPLACE, expansion, 283 

Lattice(s), 68, 360, 483, 487, 517; 
Boolean, 67, 484, 490, 495; 
complemented, 68; complete, 138n76, 
491; distributive, 68, 489; modular, 
501; points, 204; semi-, 487; set-, 
499; sub-, 492, 517; -theoretic 
connectives, 487; theory, 28 

Law: of the excluded middle, 8; of 
inertia (Sylvester), 272; reflexive, 53 

Leading coefficients, 299 

Least: common multiple, 359; common 
upper element, 486; element, 490, 
523; upper bound, 68, 486 

Left: cosets, 186; difference, 161; 
ideals, 357; inverse, 167n2; 
-multiplication in the domain of complex 
numbers, 457; residue classes, 186 

LEIBNIZ, GOTTFRIED WILHELM, 42 

Lemma: fundamental, of the theory of 
divisibility, 359; Kuratowski’s, 524; 
Zorn’s, 164, 522, 525 

Length: of a chain, 216; finite, 495; 
finite ascending, 495; finite 
descending, 495; of a group, 216 

Levels of real numbers, 8 

Lexicographic ordering, 130 

Lexicographically ordered, 159 

Liar: Antinomy of the, 76, 81; 
Paradox of the, 77 
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Limit: number(s), 56, 161; of a sequence, 
132 

Line(s), 506; complex projective, 471 

Linear: combination, 239; mappings, 
234: order, 485; transformation, 246; 
transformation, ring of, 249 

Linear form(s), 234, 262, 345n48; module 
of, 263; multi-, 264, 268 

Linearly: dependent, 240; independent, 
240, 526: ordered, 522 

Little Desargues theorem, 482 

Logarithm, 151 

Logic: Algebra of Logic (Boole), 42; 
classical, 4; of the first order, 73; 
history of, 9; operator in, 16; of the 
second order, 23n72. See also 
Predicate, logic 

Logical: connectives, 487; matrix 
(truth table), 12; symbols, 484 

Logicism, 51 

Logics, many valued, 10 

Longitudinal reflections, 205 

LORENZEN, P., 40, 72, 79, 94n3, 536 

LOWENHEIM, and SKOLEM, theorem of, 71 

Lower: bound, 523; bound, greatest, 
68; element, greatest common, 486; 
neighbor, 485, 523 

Lowest: common denominator, 360; 
terms, 360 


Manifold(s): algebraic, 354; of zeros of 
an ideal, 354 

Many-place properties, 21 

Mapping(s), 64, 509; bilinear, 234; 
inverse, 510; invertible, 64; invertible, 
one-to-one, 64; linear, 234; 
normalization of a, 280; onto, 395n; 
rigid, 172n 

Mathematical: induction, 94; system, 508 

Matrix(ces), 234; addition of, 251; 
characteristic equation of a, 286; 
coefficient, 259; conjugate transposed, 
270; diagonal form of a, 260; echelon, 
259; equivalence of, 254; extended, 
259; Hermitian, 271; logical, 12; 
multiplication of, 177, 252; rank of a, 
255, 260, 284; column rank of a, 
253; skew-symmetric, 314; square, 
177; of a transformation, 250; unit, 
273 

Maximal, 525; condition, 360; element, 
164, 523; ideal, 403; segment, 157; 
subgroup, 186 

Maximal chain, 504; principle, Hausdorff- 
Birkhoff, 530, 533 

Mechanics, quantum, 10 

Meet, 487; -hereditary, 518 
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MERSENNE numbers, 371 

Meta-metalanguage, 85 

Metalanguage, 84 

Metamathematics, 3-4 

Metric: Hermitian, 468; space, 270; 
structure, 270 

Minimal: element, 523; subgroup, 186 

Moasius: function, 388; inversion 
formula, 389 

Model, 23, 516 

Modified principle of induction, 101 

Modular: identity, 501; lattice, 501; 
semi-, 503; semi-, above, 503; semi-, 
below, 503 

Module, 111, 318n4; of integers, 112, 
120: of linear forms, 263; ordered, 
120; complete ordered, 138; property, 
338 

Modulo, congruence, an ideal, 341; a 
subgroup, 380 

Modulo n, reduced, 174 

Modulus, 458 

Modus ponens, 41 

Monomorphic, 73, 521; system of 
axioms, 30 

Monotone endomorphism, 146 

Monotonic law for multiplication, 120 

Monotonicity of addition, 100 

Motions: group of, 172; proper, 214; 
spiral, 205 

MOUFANG, R., 482 

Multilinear forms, 264, 268 

Multiple: least common, 359; zero, 303 

Multiplication, 99; algorithm for, 34; 
componentwise, 392; of 
determinants, 280; of 
endomorphisms, 115; group, 28; 
left-, in the domain of complex 
numbers, 457; of matrices, 177, 252; 
monotonic law for, 120; of ordinal 
numbers, 160; of real numbers, 146; 
table, 345; table, for a group, 188; of 
transformations, 248; of vectors, 267 

Multiplicative: group of a field, 324, 
357; semigroup of a ring, 357 

Multiplicity of a zero, 302; criterion for, 
426 


Naive set theory, 51, 534 

Natural: inference, system of, 42; 
languages, 4, 9, 85; number, 72; 
numbers, totality of, 72; science, 27 

Negation, 13 

Negative integers, 113 

Neighbor: lower, 485, 523; upper, 523 

Nested intervals, 133 

Neutral element, 111, 167, 237 
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Nilpotent: divisor of zero, 342; groups, 
220 

Noetherian rings, 339 

Nominalism, 3 

Nonelementary: composition, 510; 
structure, 521 

Nonseparable polynomial, 424 

Norm, 403, 460, 470, 481 

Normal, 183; extensions, 420; form, 
prenex, 518; polynomial, 453; 
subgroups, 188, 194, 518 

Normality conditions, 265 

Normalization of a mapping, 280 

Normalized, 370 

Normalizer, 218 

‘ Normed, 449 

“not,” 12, 13 

Notation: autonomous, 10n; kernel- 
index, 246 

nth root, 148 

Number(s): A, 77; abundant, 371; 
algebraic, 401; amicable, 372; Betti, 
203; cardinal, 54, 94; Cayley, 481; 
Cayley conjugate, 481; class, 56, 404; 
deficient, 371; Fermat, 372; Gauss, 372; 
Gédel, 76; irrational, 152n; limit, 56, 
161; Mersenne, 371; natural, 72; 
natural, totality of, 72; and numerals, 
39n14; perfect, 7, 371; sequence of, 
104; sum of, 95; theory, analytic, 406; 
torsion, 203; transcendental, 401 

Number, rational, 122; field of, 124; 
positive, 125 
See also Complex, numbers ; 
Ordinal number(s); Prime, number ; 
Real, number(s) 

Numerals, and numbers, 39n14 

Numerator, 122 


O-place predicates, 16 

Octaves, Cayley, 481 

One-to-one, 510; (invertible) mapping, 64 

Onto, 64, 509; mapping, 395n 

Operation: binary, 167; outer, 510 

Operator(s): description, 12, 18; 
domains, 511; group with, 518; in 
logic, 16; of set formation, 12 

“Or,” 12, 13, 67, 483 

Order(s), 32, 485, 522; basis of Ath, 
405; diagram, 485; of element G, 
193; of a group, 168; of a group 
element, 193; linear, 485; -preserving 
isomorphism, 144; set-, 523; type, 56 

Orderable, 527 

Ordered, 524; field, 527; linearly, 522; 
module, 120; module, complete, 138; 
pairs, 52; ring, 120. See also Sets, 
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ordered 

Ordering(s), 63; Archmidean, 127; 
lexicographic, 130; partial, 63; 
quasi-, 356; semi-, 63. See also 
Well-ordering(s) 

Ordinal number(s), 56, 153, 158; 
isolated, 161; multiplication of, 160; 
of the first kind, 161; of the second 
kind, 161; sum of, 159 

Ornament(s), 204; elementary, 204 

Orthogonal, 273, 285; transformations, 
273 

Orthogonality conditions, 265 

Outer: operation, 510; product, 235, 275 


P-group(s), 219; Sylow, 220 

Pairs, ordered, 52 

Paradox, 80; of the Liar, 77 

Part: imaginary, 458; preperiodic, 378; 
real, 458: scalar, 475; vector, 475 

Partial: fractions, 368; orderings, 63; 
well-orderings, 64 

Partition function, 406 

PASCAL, triangle, 296 

PEANO, GIUSEPPE, 72, 94; system, 9, 30; 
system of axioms, 93 

PELL, equation, 397 

Perfect, 198; number, 7, 371 

Period, primitive, 378 

Permanence principle of Hankel, 105n26 

Permutable element, 168 

Permutation(s), 169; identical, 170 

PHILON, 14 

Pigeon-hole principle, 102, 465 

Planar rotation group, 214 

Plane: affine complex, 468; 
dilatations of the, 456; Gauss, 457; 
rotation of the, 456 

PLATO, 4 

POINCARE, H., 535 

Point(s), 169, 490, 506; infinitely distant, 
5; lattice, 204 

Polygon, constructibility of a regular, 454 

Polynomial(s), 299, 304; cyclotomic, 
428, 430; derivative of a, 423; 
nonseparable, 424; elementary 
symmetric, 307; general, 453; normal, 
453; primitive, 335, 449; ring, 299, 
304: separable, 422; value of a, 300; 
zero of a, 302 

Positive: definite, 273; integers, 113; 
rational numbers, 125 

Positivity, domain of, 120, 464, 527 

Potential infinity, 7 

Power, 125; of a group, 192; rule, 200; 
set, 52, 54, 485; -series, 308; -series field, 
311; -series ring, 311; set axiom, 60 
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Pre-image, 64 

Predicate(s), 12, 15-16, 21; k-place, 16; 
O-place, 16; two-place, 16; variable, 11, 
22; variable, generalized, 72 

Predicate logic: completeness of, 42; 
elementary, 73; extended, 23n, 72; 
incompleteness of extended, 42 

Prenex normal form, 518 

Preperiod, 387 

Preperiodic part, 378 

Primary, 496; ideal, 342 

Prime, 329; divisor, 358; element, 403; 
field, 324; ideal, 342, 403; number, 
75; number theorem, generalization 
of, 406; regular, 405 

Primitive: period, 378; polynomial, 335, 
449; root, 399; root of unity, 427 

Principal ideal(s), 339; ring, 339 

Principia Mathematica (Whitehead and 
Russell), 42 

Principle: of duality, 68; of 
extensionality, 51, 58; Hausdorff- 
Birkhoff maximal chain, 530, 533; 
high chain, 522, 530; high chain, 
sharpened, 530n; permanence 
(Hankel), 105; pigeonhole, 102, 465; 
of recursion, 96; Russell’s vicious 
circle, 535; of two-valuedness, 10, 21 

Problem(s): type and structure, for 
groups, 191; Waring, 406; word, 35, 232 

Product(s), 167; alternating, 235, 275; 
complex-, 182, 266; direct, 198, 493; 
direct, of groups, 394; function, 38; 
inner, 234, 266, 269: outer, 235, 275; 
quaternion, 475; relative, 62; scalar, 
234, 266, 269; tensor, 235, 273; 
vector, 266 

Projection, 255 

Projective line, complex, 471 

Proof, 41, 43: algorithmic, 33; 
finished, 46; indirect, 42-43 

Proper: chains, 495; divisor, 357; 
divisor chain, 330; factor, 328; 
fraction, 368; motions, 214; segments, 
156; subset, 52, 95n5; upper bound, 
523; variable, 34 

Properly simple, 196 

Property(ies), 21; many-place, 21; 
module, 338; representing, 535; 
two-place, 21 

Proposition, relevant, 29 

Propositional: constants, 12; form(s), 
11, 22, 94n2, 535; variables, 15 

Propositions, 10, 11 

“Protologic,”’ 79 

Pure: congruences, 399; -elementary 
structures, 521 
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Pythagorean: theorem, 4, 27; triples, 398 


Quadratic: form, 268; reciprocity, 400; 
residues, 399; residue, Euler criterion 
for a, 400 

Quantifier(s), 12, 16; existential, 17, 18: 
universal, 17, 18 

Quantum mechanics, 10 

Quasi-ordering, 356 

Quaternion(s), 317n1, 470; conjugate, 
470; group, 194; product, 475; 
skew field of, 470 

QUINE, W. V., and GENTZEN, 43 

Quotient(s), 125; field, 125, 325; ring, 368 


Radical(s): over a field, 452; 
solvability by, 452 

Ramified analysis, 40 

Range (of a function), 50 

Rank: of a matrix, 255, 260, 284; 
column, of a matrix, 253; of a 
transformation, 247 

Rational numbers, 122; field of, 124; 
positive, 125 

Real: -closed, 463; part, 458 

Real number(s), 134, 135; field of, 141; 
improper, 136; levels of, 8; 
multiplication of, 146 

Realism, 3 

Reciprocal, 123 

Reciprocity, quadratic, 400; 
generalized law of, 400 

“Rectangular array,” 250 

Recursion, principle of, 96 

Recursive: definition, 96; definition of 
addition, 98n12; function, 35, 38 

Recursively enumerable, 33, 35 

Reduced: modulo n, 174; remainder, 
174 

Reducible, 328 

Reflection(s), 205; glide, 205; 
longitudinal, 205; rotatory, 206; 
transverse, 205 

Reflexive, 63; law, 53 

Reflexivity, 29 

Regressions, 157 

Regular, 253; continued fraction, 
333n29, 373; prime, 405; 
representation, 223 

Relation(s), 21, 509; congruence, 65, 
514; connex, 63; converse, 62; 
empty, 62; equivalence, 29, 65, 108; 
field of, 63; identitive, 63; 
isomorphic, 64; Lagrange, 398; 
reflexive, 63; successor, 32; symmetric, 
63; theory of, 9, 61; transitive, 63; 
universal, 62; void, 62 


INDEX 


Relative: complement, 499; 
consistency, 31; product, 62 

Relatively complemented, 499 

Relevant : expression, 76n; proposition, 29 

Remainder: reduced, 174; theorem 
(Chinese), 391 

Replacement axiom, 60 

Representation, 220; degree of a, 220; 
regular, 223; transitive, 222 

Representing property, 535 

Residue(s): class ring, 342, 381; 
classes, 109, 186, 341, 461; quadratic, 
399; quadratic, Euler criterion for a, 
400; system, complete, 381 

Restriction against circularity, 43 

Resultant, 349 

RIEMANN, sphere, 472 

Right: cosets, 186; difference, 161; 
ideals, 357; inverse, 167n2; residue 
classes, 186 

Rigid mapping, 172n 

Ring(s), 116; additive group of a, 318, 
357; Boolean, 69; commutative, 117, 
317; of endomorphisms, 116; 
Euclidean, 332, 361; extension, 297; 
factorization theorem for, 343; 
homomorphism theorem for, 342; of 
integers, 120; integrally closed, 403; 
of linear transformations, 249; 
multiplicative semigroup of a, 357; 
Noetherian, 339; ordered, 120; 
polynomial, 299, 304; power-series, 
311; principal ideal, 339; quotient, 
368; residue class, 342, 381; theory, 
28; unique factorization, 331 

RopriGues, formula for rotations, 477 

Roof, 529 

Root, 302n15; mth, 148; primitive, 399; 
square, 148; of unity, 425; exponent 
of a, 445; primitive, 427 

Rotation(s), 205; dilative, 457; dilative, 
Hermitian, 469; Hermitian, 468; 
of the plane, 456; Rodrigues formula 
for, 477; of the sphere, 472 

Rotatory reflections, 206 

Rule(s): of an algorithm, 33; Horner’s, 
294, 295; of inference, 41, 43; power, 
200; of separation, 41 

RUSSELL, BERTRAND, 22, 51; antimony, 
59, 81; vicious circle principle, 535; 
and A. N. Whitehead, 42 


Scalar(s): domain of, 235; part, 475; 
product, 234, 266, 269 

Schema: axiom, 26, 34; induction, 75 

SCHNIRELMANN, 405; basis theorem of, 
407 
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SCHOUTEN, 246, 262 

SCHREIER, ARTIN-, theorem of, 527 

Science: abstract, 27; natural, 27 

Second: axiom for unions, 60; Cantor 
diagonal procedure, 55, domain, 62; 
language layer, 136n74 

Segment(s), 102, 156, 520; initial, 156, 
532; maximal, 157; proper, 156 

Self-contradictory system of axioms, 31 

Self-dual, 501 

Semantic(s), 20; antinomy, 81; 
consistency, 31 

Semi-orderings, 63 

Semilattice, 487 

Semimodular: above, 503; below, 503 

Sense (Sinn), 10 

Separability, criterion for, 424 

Separable: extensions, 422; 
polynomial, 422 

Separation, rule of, 41 

Sequence(s), 64; fundamental, of Cantor, 
133; Cauchy or fundamental, 139; 
diagonal, 152; limit of a, 132; of 
numbers, 104; zero, 140 

Series: composition, 216, 354; of fields 
and groups, 438; power-, 308; power-, 
field and ring, 311 

Set(s), 50; algebra of, 53; axiom for, 
with one element, 60; cardinality of a, 
54; directed, 64; element of a, 51; 
empty, 52, 103; empty, axiom for the, 
60; equivalence of, 103; extension of 
a, 51; field of, 53; finite (Dedekind 
definition), 103n21; formation, 
operator of, 12; ground, 62; 
-lattice, 499; -orders, 523; power, 
52, 54, 485; power, axiom, 60; 
-theoretic connectives, 487; universal, 
52, 57; universal, antinomy of the, 57 

Sets, ordered, 50, 56, 522; linearly, 522; 
totally, 522, 533; well-, 153, 155, 532 

Sets, theory of, 3—4, 9; class in, 58; 
naive or intuitive, 51, 534 

Sharpened high chain principle, 530n 

Signature of a form, 272 

Signs, function, 12, 15 

Similar, 56, 221, 226, 262 

Similarity, 156 

Simple, 228; group, 196; properly, 196 

Skew field, 235, 324, 526n; of 
quaternions, 470 

Skew-symmetric: matrix, 314; tensor, 276 

SKOLEM, and LOWENHEIM, theorem of, 71 

Smallest splitting field, 413; 
uniqueness theorem for, 415 

Solutions, fundamental system of, 257 

Solvability, by radicals, 452 
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Solvable, 216 

Space, 169; dual, 263; image, 247; 
metric, 270; vector, 233, 238, 526n; 
vector, dual, 234 

Spanned subspace, 243 

Special continuum hypothesis, 60 

Sphere: Riemann, 472; rotations of 
the, 472 

Spiral motions, 205 

Splitting field, 413; smallest, 413; 
smallest, uniqueness theorem for, 415; 
in the wider sense, 413 

Square: -free, 388; matrix, 177; root, 148 

Statements, valid, 516 

Stoics, 14 

Structure(s), 5, 508, 515, 516; 
constants, 402-403n37; elementary- 
arithmetical, 521; elementary-logical, 
521; metric, 270; nonelementary, 521; 
problem for groups, 191; 
pure-elementary, 521; type, 516 

Sturm: chain, 463; theorem, 463 

Subband(s), 492, 517 

Subconfiguration, 517 

Subdeterminant, algebraic complement 
of a, 283 

Subfield, 297n11 

Subgroup(s), 184; admissible, 518; 
criterion for, 184; congruence 
modulo a, 380; improper, 184; index 
of a, 186; intersection of, 436; 
invariant, 194; maximal, 186; 
minimal, 186; normal, 188, 194, 518; 
proper, 184; trivial, 184 

Subideal, 339n 

Subject(s), 12, 15, 21; variables, 11 

Sublattice, 492, 517 

Subring, 322, 338 

Subset, 52; proper, 52, 95n5 

Subspace, 526n; spanned, 243; vector, 
243, 526n 

Substitution, 168, 300 

Successive application, 115 

Successor, 38, 56, 93, 94; relation, 32 

Sum, 133; digital, 385, 386; direct, of 
the ideals, 392; of endomorphisms, 
114; function, 38; of numbers, 95; 
of ordinal numbers, 159 

Summation convention, Einstein, 239 

Summatory function, 371, 388 

Superfluous elements, 497 

SUZUKI, M., 519 

SYLOW: p-groups, 220; theorem, 219 

SYLVESTER: determinant, 349; law of 
inertia, 272 

Symbol(s): fin, 134; Jacobi, 400; 
Kronecker, 245; logical, 485 
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Symmetric, 63, 108, 270, 306; group, 
171; skew-, matrix, 314; skew-, 
tensor, 267 
elementary: functions, 307n23; 
polynomials, 307 

Symmetry, 29 

Syntactic: antinomy, 81; 
consistency, 31 

Syntax, 20 


TaRskI, A., 20 

Tautology, 23, 24-25 

Tensor: covariant, 264; fundamental, 270; 
product, 235, 273; skew-symmetric, 276 

Terms, lowest, 360 

Tertium non datur, 43, 47, 49 

Tetartohedrism, 209 

THALES, 27 

Theorem: fundamental, of algebra, 467; 
Artin-Schreier, 527; Bernstein 
equivalence, 54; Bézout, 354; 
binomial, 295; Cantor fundamental, 
163; chain, of Dedekind, 504; 
Chinese remainder, 391 ; 
completeness, of Gédel, 42; for 
complex numbers, fundamental 
algebraic, 467; for complex numbers, 
fundamental topological, 467n8; 
fundamental, for cyclic groups, 192; 
little, of Desargues, 482; 
factorization, for rings, 343; 
Fermat, 382; of Frobenius, 478; 
incompleteness (Gédel), 72; 
intermediate value, 462; isomorphism, 
214; Jordan-Hdlder, 216, 354, 504; 
of Léwenheim and Skolem, 71; 
prime number, 406; Pythagorean, 
4, 27; Schnirelmann basis, 407; 
Sturm, 463; Sylow, 219; Uniqueness, 
for smallest splitting fields, 415; 
well-ordering, 56, 525; Wilson’s, 384 

Topological: group, 232; theorem for 
complex numbers, fundamental, 467n8 

Torsion: -free, 200; group, 200; 
numbers, 203 

Totality of natural numbers, 72 

Totally ordered, 522, 533 

Trace, 286 

Transcendent(s), 296; independent, 304 

Transcendental number, 401 

Transfinite, 55; induction, 57; 
inductive definition, 57 

Transformation(s): addition of, 247; 
invertible, 249; linear, 246; linear, 
ring of, 249; matrix of a, 250; 
multiplication of, 248; orthogonal, 
273; rank of a, 247 
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Transforms, 183 

Transitive, 63, 108; law, 53; 
representation, 222 

Transitivity, 29, 100; domain of, 221 

Translations, 205 

Transposition(s), 226; even, 227 

Transverse reflections, 205 

Treillis, 483 

Triangle, 92; Pascal, 296 

Triples, Pythagorean, 398 

Trisection of an angle, 417 

Trivial: divisors, 357; subgroup, 184 

True, 10, 23 

Truth: table (logical matrix), 12; 
-value, 107, 109 

TURING, 36 

Two-place: predicate, 16; property, 21 

Two-sided: division, 180; ideals, 357 

Two-valuedness, principle of, 10, 21 

Type: order, 56; problems for groups, 
191; structure, 516 


Uncountable, 55, 151 

Unimodular, 399n 

Union(s), 52, 62, 483, 486; first and 
second axioms for, 60; of fields, 436; 
of groups, 436 

Unique factorization rings, 331 

Uniqueness theorem for smallest 
splitting fields, 415 

Unit(s), 327, 356; element, 66, 116, 167, 
179, 321, 490; ideal, 338; matrix, 273; 
vector, 285 

Unitary, 273 

Unity: element, 321; function, 388 
root of, 425; exponent of a, 445; 
primitive, 437 

Universal: class, 59; quantifier, 17, 18; 
relation, 62; set, 52, 57; set, 
antinomy of, 57 

Upper: bound, 486, 523; bound, least 
and proper, 68, 486, 523; neighbor, 
523 


Valid statements, 516 

Valuation, 310n, 405 

Value: absolute, 128, 310n, 458; 
equality of, 122; of a polynomial, 
300; theorem, intermediate, 462; 
truth-, 107, 109 

Variable(s), 11; bound, 17, 23, 60; 
flagged, 43; free, 17; free renaming 
of a, 44n18; predicate, 11, 22; 
predicate, generalized, 72; proper, 
34; propositional, 15; subject, 11 

Vector(s), 233; basis, 234; 
contravariant, 264; covariant, 264; 
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multiplication of, 267; part, 475: 
product, 266; space, 233, 238, 526n; 
space, dual, 234; subspace, 243, 526n: 
unit, 285 

Vel, 13 

Verband, 483 

Verkniipfung, 510 

Vicious circle principle (Russell’s), 535 

VINOGRADOV, 405 

Void relation, 62 

VON NEUMANN, 51, 58 


WARING, problem, 406 

Well-ordered sets, 153, 155, 532 

Well-ordering(s), 64, 101; axiom, 164; 
partial, 64; theorem, 56, 525 

WESTON, J. D., 530 

WHITEHEAD, A. N., and RUSSELL, 42 

WILSON, theorem of, 384 

Word(s), 6, 230; empty, 231; problem, 
35, 232 


ZERMELO, E., 51, 535; axiom of choice, 
164 

Zero(s), 111; aleph-, 55; divisors of, 
119, 293, 322; nilpotent divisor of, 
342; element, 179, 490; of a function, 
293; ideal, 338; manifold of, of an 
ideal, 354; multiple, 303; multiplicity 
of a, 302, 426; of a polynomial, 302; 
sequence, 140 

Zorn, lemma of, 164, 522; sharpened 
form of, 525 


